Scala's BigDecimal.hashCode() implementation, while not technically broken, is implemented in such a way as to nearly guarantee a large number of collisions. The current implementation looks like this:
unifiedPrimitiveHashcode essentially casts this to an Int and returns the result.
There are several problems with this implementation. For the time being, let's ignore whole numbers caught by if(isWhole) and assume we're working with numbers having fractional parts. The fact that the number's Double representation is used to calculate the hash value in this case yields the following undesirable properties:
- All BigDecimal values greater than Double.MaxValue have equivalent hash values, since they all map to Double.PositiveInfinity
- All BigDecimal values less than Double.MinValue have equivalent hash values for the same reason
- All BigDecimal values with absolute value smaller than Double.MinPositiveValue have equivalent hash values, since they all map to 0.0
- All BigDecimal values with more than 17 significant digits have nearby neighboring values with the same hash value, since all the digits after the first 17 are truncated by the cast to Double
Here is some simple test code that demonstrates the problem:
BigDecimal values that are whole numbers fare slightly better. Since they are cast to Ints, their lower 32 bits are used as the hash value, solving the nearby neighbor collision problem. Unfortunately, since only the first 32 bits are used, it is trivial to generate collisions between them simply by flipping bits beyond the first 32. This will lead to poor performance for any algorithms using them as masks, multiplying or dividing by large powers of two, working with Mersenne primes, etc.
The following improved hashCode() implementation, presented as pseudo-code, should solve both of these problems and is relatively simple. Since hashCode would be more expensive to calculate, it might make sense to store the result as well. I haven't tested this implementation.