You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original (and still current) probe implementation in OpenHashMap is based on the algorithm used in Python. It is exponential (in the long run), with sequence (h+¼)·5 ^i^ - ¼ modulo the table size, where i is the sequence index and h is the hash of the key. (Actually, the start of the sequence is modified by adding in a "perturbation" at each step, but this value becomes 0 or -1 after the 8th step.)
Using these probe intervals leads to poor memory locality, as the sequence jumps around the table.
This ticket replaces the probe algorithm with a more sensible and well-studied quadratic probe with small step size.
The text was updated successfully, but these errors were encountered:
Mike (mike) said:
This doesn't appear to be a bug per se, because when I ran the following test, it gave no output:
defhashOf(key: Int) = {
varh= key
h ^= ((h >>>20) ^ (h >>>12))
h ^ (h >>>7) ^ (h >>>4)
}
defcheck(key: Int) = {
valhash= hashOf(key)
valarr=newArray[Boolean](64)
valmask= arr.length -1varj= hash
varindex= hash & mask
varperturb= index
(0 to arr.length *2).foreach( i => {
arr(index) =true
j =5* j +1+ perturb
perturb >>=5
index = j & mask
})
valfalses= arr.filterNot(i => i).length
if (falses >= arr.length /2)
println( key, falses, arr.mkString(",") )
}
(0 to 100000000).foreach( check(_) )
Still, I've seen no theoretical justification for why this "exponential probing" should work, nor have I seen this technique used elsewhere, and it adds unnecessary computation compared to linear or quadratic probing, while having the poor locality of reference of double hashing but none of its non-clustering guarantees. So I still think it should be replaced on performance grounds.
The original (and still current) probe implementation in
OpenHashMap
is based on the algorithm used in Python. It is exponential (in the long run), with sequence (h+¼)·5 ^i^ - ¼ modulo the table size, where i is the sequence index and h is the hash of the key. (Actually, the start of the sequence is modified by adding in a "perturbation" at each step, but this value becomes 0 or -1 after the 8th step.)Using these probe intervals leads to poor memory locality, as the sequence jumps around the table.
This ticket replaces the probe algorithm with a more sensible and well-studied quadratic probe with small step size.
The text was updated successfully, but these errors were encountered: