Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allegedly faster HashSet implementation #2512

Closed
scabug opened this issue Oct 23, 2009 · 15 comments
Closed

Allegedly faster HashSet implementation #2512

scabug opened this issue Oct 23, 2009 · 15 comments

Comments

@scabug
Copy link

scabug commented Oct 23, 2009

scala.tools.nsc.util.hashSet.growTable() should set

used = 0

when allocating the new Array.

With the current implementation, a hashSet with 16 entries reports size 48 and has capacity 256.

With the proposed fix, it would report size 16 and have capacity 128.

Here's an example of the value of the memory saving: when attempting to compile an existing program with truck (-Xmx1312M on Win XP) I experience outOfMemoryError: Java heap space

        at scala.tools.nsc.util.HashSet.growTable(HashSet.scala:59)
        at scala.tools.nsc.util.HashSet.addEntry(HashSet.scala:42)
        at scala.tools.nsc.symtab.Types$$class.unique(Types.scala:2415)
        at scala.tools.nsc.symtab.Types$$class.mkThisType(Types.scala:2075)
@scabug
Copy link
Author

scabug commented Oct 23, 2009

Imported From: https://issues.scala-lang.org/browse/SI-2512?orig=1
Reporter: Eric Willigers (ewilligers)
Attachments:

@scabug
Copy link
Author

scabug commented Oct 24, 2009

@paulp said:
That's a pretty titanic bug. Look at what it did to the statistics -- uniquetypes is reporting size, I added uniquetypes2 to count what's really in there.

[scalacfork] *** Cumulative statistics at phase cleanup
[...]
[scalacfork] #uniquetypes : 2242323
[scalacfork] #uniquetypes2: 962323

Sadly no performance improvement that I can see, but maybe that's because I'm nowhere near memory bound.

I'll fix it shortly.

@scabug
Copy link
Author

scabug commented Oct 25, 2009

@paulp said:
Fixed in r19265.

@scabug
Copy link
Author

scabug commented Oct 26, 2009

Eric Willigers (ewilligers) said:
Possibly faster HashSet, with identical allocation behaviour to r19265

@scabug
Copy link
Author

scabug commented Nov 22, 2009

@paulp said:
FYI I stumbled across this (adding attachments to a closed ticket is not a good way to make sure they're seen) and gave it a whirl, but my extremely unscientific test showed zero speed difference from what's there, so no dice unless you would like to assemble some convincing performance metrics.

@scabug
Copy link
Author

scabug commented Nov 26, 2009

Eric Willigers (ewilligers) said:
Benchmark

@scabug
Copy link
Author

scabug commented Nov 26, 2009

Eric Willigers (ewilligers) said:
(reopening for extempore to consider my benchmark, which uses uniformly distributed hashCode values)

Running my attached benchmark, my suggested hashset takes 30% to 40% less time.

I used Scala 2.8.0.r19890-b20091126020351 and Java 1.7.0-ea-b76 on XP
JAVA_OPTS="-Xms1024M -Xmx1024M -XX:+PrintCompilation -server -Xbatch -XX:CICompilerCount=1"

@scabug
Copy link
Author

scabug commented Nov 26, 2009

@ijuma said:
Using uniformly distributed hashCodes on their own is not enough unless one expects that to be true for the structure in question though.

@scabug
Copy link
Author

scabug commented Nov 27, 2009

@paulp said:
Replying to [comment:5 ijuma]:

Using uniformly distributed hashCodes on their own is not enough unless one expects that to be true for the structure in question though.

If I finish up #2537 then it will be true, because for the big cases in the compiler what is being stored in HashSet are case classes.

@scabug
Copy link
Author

scabug commented Nov 27, 2009

@ijuma said:
I have it on my todo list to do a bunch of testing when it comes to maps, sets and case classes with and without jenkins hash. My hope is to do it in the next few weeks, but we'll see.

Interesting that the compiler uses case classes that much. For maps in Java, the statistics I've seen say that over 50% uses a String key.

@scabug
Copy link
Author

scabug commented Oct 11, 2010

@paulp said:
Unable to witness a performance difference and unwilling to go crazy looking for it, I choose option C, assign to scala community.

@scabug
Copy link
Author

scabug commented Feb 1, 2015

@Ichoran said:
The new AnyRefMap is hugely faster than this for large HashSets, but equally much slower for small HashSets. That indicates that there is significant potential for improvement, but it's not as simple as just dropping in AnyRefMap.

@scabug
Copy link
Author

scabug commented Feb 3, 2015

@kanielc said:
This ticket is over 5 years old, should it be closed with some sort of resolution?

@scabug
Copy link
Author

scabug commented Feb 3, 2015

@Ichoran said:
@kanielc - No, because it's not resolved yet.

@som-snytt
Copy link

@SethTisue the initial bug was fixed, hashes improved, no definitive reproducible demonstration of an issue with 2.13 collections. No one is crying "my small AnyRefMap is slow". Suggest closing with your friendly, Happy to re-open if someone discovers an issue.

Alternatively, transfer the issue to collections-next for dotty to inherit after the thaw.

@SethTisue SethTisue closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2023
@SethTisue SethTisue removed this from the Backlog milestone Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants