use quadratic probing in OpenHashMap #9789

scabug · 2016-05-22T01:13:12Z

The original (and still current) probe implementation in OpenHashMap is based on the algorithm used in Python. It is exponential (in the long run), with sequence (h+¼)·5 ^i^ - ¼ modulo the table size, where i is the sequence index and h is the hash of the key. (Actually, the start of the sequence is modified by adding in a "perturbation" at each step, but this value becomes 0 or -1 after the 8th step.)

Using these probe intervals leads to poor memory locality, as the sequence jumps around the table.

This ticket replaces the probe algorithm with a more sensible and well-studied quadratic probe with small step size.

The text was updated successfully, but these errors were encountered:

scabug · 2016-05-22T01:13:12Z

Imported From: https://issues.scala-lang.org/browse/SI-9789?orig=1
Reporter: Mike (mike)
Assignee: Mike (mike)
Affected Versions: 2.11.8, 2.12.0-M4
Other Milestones: 2.12.0-M5
Attachments:

OpenHashMap_AnyRef.svg (created on May 28, 2016 5:06:40 PM UTC, 72449 bytes)
OpenHashMap_Int.svg (created on May 28, 2016 5:06:40 PM UTC, 78840 bytes)

scabug · 2016-05-22T16:48:28Z

Mike (mike) said:
scala/scala#5183

scabug · 2016-05-24T05:47:32Z

Mike (mike) said:
This doesn't appear to be a bug per se, because when I ran the following test, it gave no output:

def hashOf(key: Int) = {
  var h = key
  h ^= ((h >>> 20) ^ (h >>> 12))
  h ^ (h >>> 7) ^ (h >>> 4)
}

def check(key: Int) = {
  val hash = hashOf(key)
  val arr = new Array[Boolean](64)
  val mask = arr.length - 1

  var j = hash
  var index = hash & mask
  var perturb = index
  (0 to arr.length * 2).foreach( i => {
      arr(index) = true
      j = 5 * j + 1 + perturb
      perturb >>= 5
      index = j & mask
    })

  val falses = arr.filterNot(i => i).length
  if (falses >= arr.length / 2)
    println( key, falses, arr.mkString(",") )
}

(0 to 100000000).foreach( check(_) )

Still, I've seen no theoretical justification for why this "exponential probing" should work, nor have I seen this technique used elsewhere, and it adds unnecessary computation compared to linear or quadratic probing, while having the poor locality of reference of double hashing but none of its non-clustering guarantees. So I still think it should be replaced on performance grounds.

scabug closed this as completed May 30, 2016

scabug added enhancement quickfix has PR library labels Apr 7, 2017

scabug added this to the 2.11.9 milestone Apr 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use quadratic probing in OpenHashMap #9789

use quadratic probing in OpenHashMap #9789

scabug commented May 22, 2016 •

edited by retronym

scabug commented May 22, 2016

scabug commented May 22, 2016

scabug commented May 24, 2016

use quadratic probing in OpenHashMap #9789

use quadratic probing in OpenHashMap #9789

Comments

scabug commented May 22, 2016 • edited by retronym

scabug commented May 22, 2016

scabug commented May 22, 2016

scabug commented May 24, 2016

scabug commented May 22, 2016 •

edited by retronym