Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashCode of Product inconsistent on different machines/configurations #1387

Closed
scabug opened this issue Sep 27, 2008 · 7 comments
Closed

hashCode of Product inconsistent on different machines/configurations #1387

scabug opened this issue Sep 27, 2008 · 7 comments
Assignees

Comments

@scabug
Copy link

scabug commented Sep 27, 2008

In my attempts to get a more scala friendly library for Hadoop, I've
come across this in ScalaRuntime

84 def _hashCode(x: Product): Int = {
85 var code = x.getClass().hashCode()
86 val arr = x.productArity
87 var i = 0
88 while (i < arr) {
89 code = code * 41 + x.productElement(i).hashCode()
90 i += 1
91 }
92 code
93 }
94

Sadly, x.getClass.getHashCode doesn't seem to be consistent across
different machines or different executions (not sure which), which
makes it useless in Hadoop, since Hadoop assumes that hashCode is the same no matter the machine. Any chance we could get this changed?

How about:

var code = x.productPrefix.hashCode();

or var code = x.getClass.getName.hashCode() ?

  • David
@scabug
Copy link
Author

scabug commented Sep 27, 2008

Imported From: https://issues.scala-lang.org/browse/SI-1387?orig=1
Reporter: @dlwh
Attachments:

  • diff-1387.txt (created on Oct 30, 2008 11:46:04 PM UTC, 497 bytes)

@scabug
Copy link
Author

scabug commented Sep 27, 2008

@mcdirmid said:
The fact that hash code is not stable is well known, which is why we have classes like LinkedHashSet and LinkedHashMap to aid in debugging. I don't see much of a problem in changing hashCode to be computed from productPrefix rather than getClass (or heck, just throw out the first term, its not needed for hash code), but hash code stability shouldn't be a general goal in the Scala language run time.

@scabug
Copy link
Author

scabug commented Oct 30, 2008

@dlwh said:
A coworker of mine found another instantiation of this bug that you might consider more serious. If you serialize a (Hash)Map that has Tuples as keys, and restart the jvm, there's a high chance that the your map won't work correctly with operations like "contains", because the hashcodes have changed. Iteration will still work, but that's only way to get at the data. To me this a very big deal, as it makes using Tuples and case classes very cumbersome.

I'd personally like to see this fixed by 2.7.2, as this is a very minor change, to that end, I'm attaching a patch.

@scabug
Copy link
Author

scabug commented Oct 30, 2008

@dlwh said:
Diff

@scabug
Copy link
Author

scabug commented Dec 21, 2008

@ijuma said:
Not only tuples, but also case classes and I agree that it's a big deal. Note that this is not just a theoretical problem, someone on IRC (Matt) was getting incorrect results on an immutable.HashSet after deserialization due to this.

@scabug
Copy link
Author

scabug commented Dec 24, 2008

@ijuma said:
Also see ticket #1600.

@scabug
Copy link
Author

scabug commented Feb 19, 2009

@DRMacIver said:
Fixed in r17161 (we're now using getClass.getName.hashCode, which is no more expensive because the hash code is cached on the String object), but I can't figure out a good test for this problem given the way partest works. If someone would like to suggest one I'd be very appreciative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants