Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use quadratic probing in OpenHashMap #9789

Closed
scabug opened this issue May 22, 2016 · 3 comments
Closed

use quadratic probing in OpenHashMap #9789

scabug opened this issue May 22, 2016 · 3 comments

Comments

@scabug
Copy link

scabug commented May 22, 2016

The original (and still current) probe implementation in OpenHashMap is based on the algorithm used in Python. It is exponential (in the long run), with sequence (h+¼)·5 ^i^ - ¼ modulo the table size, where i is the sequence index and h is the hash of the key. (Actually, the start of the sequence is modified by adding in a "perturbation" at each step, but this value becomes 0 or -1 after the 8th step.)

Using these probe intervals leads to poor memory locality, as the sequence jumps around the table.

This ticket replaces the probe algorithm with a more sensible and well-studied quadratic probe with small step size.

@scabug
Copy link
Author

scabug commented May 22, 2016

Imported From: https://issues.scala-lang.org/browse/SI-9789?orig=1
Reporter: Mike (mike)
Assignee: Mike (mike)
Affected Versions: 2.11.8, 2.12.0-M4
Other Milestones: 2.12.0-M5
Attachments:

@scabug
Copy link
Author

scabug commented May 22, 2016

Mike (mike) said:
scala/scala#5183

@scabug
Copy link
Author

scabug commented May 24, 2016

Mike (mike) said:
This doesn't appear to be a bug per se, because when I ran the following test, it gave no output:

def hashOf(key: Int) = {
  var h = key
  h ^= ((h >>> 20) ^ (h >>> 12))
  h ^ (h >>> 7) ^ (h >>> 4)
}

def check(key: Int) = {
  val hash = hashOf(key)
  val arr = new Array[Boolean](64)
  val mask = arr.length - 1

  var j = hash
  var index = hash & mask
  var perturb = index
  (0 to arr.length * 2).foreach( i => {
      arr(index) = true
      j = 5 * j + 1 + perturb
      perturb >>= 5
      index = j & mask
    })

  val falses = arr.filterNot(i => i).length
  if (falses >= arr.length / 2)
    println( key, falses, arr.mkString(",") )
}

(0 to 100000000).foreach( check(_) )

Still, I've seen no theoretical justification for why this "exponential probing" should work, nor have I seen this technique used elsewhere, and it adds unnecessary computation compared to linear or quadratic probing, while having the poor locality of reference of double hashing but none of its non-clustering guarantees. So I still think it should be replaced on performance grounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant