Allegedly faster HashSet implementation #2512

scabug · 2009-10-23T05:15:47Z

scala.tools.nsc.util.hashSet.growTable() should set

used = 0

when allocating the new Array.

With the current implementation, a hashSet with 16 entries reports size 48 and has capacity 256.

With the proposed fix, it would report size 16 and have capacity 128.

Here's an example of the value of the memory saving: when attempting to compile an existing program with truck (-Xmx1312M on Win XP) I experience outOfMemoryError: Java heap space

        at scala.tools.nsc.util.HashSet.growTable(HashSet.scala:59)
        at scala.tools.nsc.util.HashSet.addEntry(HashSet.scala:42)
        at scala.tools.nsc.symtab.Types$$class.unique(Types.scala:2415)
        at scala.tools.nsc.symtab.Types$$class.mkThisType(Types.scala:2075)

scabug · 2009-10-23T05:15:47Z

Imported From: https://issues.scala-lang.org/browse/SI-2512?orig=1
Reporter: Eric Willigers (ewilligers)
Attachments:

QuickHashSet.scala (created on Oct 26, 2009 8:02:08 AM UTC, 2274 bytes)
Test.scala (created on Nov 26, 2009 9:34:29 PM UTC, 1378 bytes)

scabug · 2009-10-24T19:32:11Z

@paulp said:
That's a pretty titanic bug. Look at what it did to the statistics -- uniquetypes is reporting size, I added uniquetypes2 to count what's really in there.

[scalacfork] *** Cumulative statistics at phase cleanup
[...]
[scalacfork] #uniquetypes : 2242323
[scalacfork] #uniquetypes2: 962323

Sadly no performance improvement that I can see, but maybe that's because I'm nowhere near memory bound.

I'll fix it shortly.

scabug · 2009-10-25T02:57:05Z

@paulp said:
Fixed in r19265.

scabug · 2009-10-26T08:02:08Z

Eric Willigers (ewilligers) said:
Possibly faster HashSet, with identical allocation behaviour to r19265

scabug · 2009-11-22T18:36:50Z

@paulp said:
FYI I stumbled across this (adding attachments to a closed ticket is not a good way to make sure they're seen) and gave it a whirl, but my extremely unscientific test showed zero speed difference from what's there, so no dice unless you would like to assemble some convincing performance metrics.

scabug · 2009-11-26T21:34:29Z

Eric Willigers (ewilligers) said:
Benchmark

scabug · 2009-11-26T21:46:00Z

Eric Willigers (ewilligers) said:
(reopening for extempore to consider my benchmark, which uses uniformly distributed hashCode values)

Running my attached benchmark, my suggested hashset takes 30% to 40% less time.

I used Scala 2.8.0.r19890-b20091126020351 and Java 1.7.0-ea-b76 on XP
JAVA_OPTS="-Xms1024M -Xmx1024M -XX:+PrintCompilation -server -Xbatch -XX:CICompilerCount=1"

scabug · 2009-11-26T22:13:55Z

@ijuma said:
Using uniformly distributed hashCodes on their own is not enough unless one expects that to be true for the structure in question though.

scabug · 2009-11-27T12:50:40Z

@paulp said:
Replying to [comment:5 ijuma]:

Using uniformly distributed hashCodes on their own is not enough unless one expects that to be true for the structure in question though.

If I finish up #2537 then it will be true, because for the big cases in the compiler what is being stored in HashSet are case classes.

scabug · 2009-11-27T13:53:38Z

@ijuma said:
I have it on my todo list to do a bunch of testing when it comes to maps, sets and case classes with and without jenkins hash. My hope is to do it in the next few weeks, but we'll see.

Interesting that the compiler uses case classes that much. For maps in Java, the statistics I've seen say that over 50% uses a String key.

scabug · 2010-10-11T20:37:31Z

@paulp said:
Unable to witness a performance difference and unwilling to go crazy looking for it, I choose option C, assign to scala community.

scabug · 2015-02-01T05:01:25Z

@Ichoran said:
The new AnyRefMap is hugely faster than this for large HashSets, but equally much slower for small HashSets. That indicates that there is significant potential for improvement, but it's not as simple as just dropping in AnyRefMap.

scabug · 2015-02-03T04:25:20Z

@kanielc said:
This ticket is over 5 years old, should it be closed with some sort of resolution?

scabug · 2015-02-03T15:38:51Z

@Ichoran said:
@kanielc - No, because it's not resolved yet.

som-snytt · 2023-11-20T07:43:41Z

@SethTisue the initial bug was fixed, hashes improved, no definitive reproducible demonstration of an issue with 2.13 collections. No one is crying "my small AnyRefMap is slow". Suggest closing with your friendly, Happy to re-open if someone discovers an issue.

Alternatively, transfer the issue to collections-next for dotty to inherit after the thaw.

scabug added enhancement help wanted performance labels Apr 7, 2017

scabug added this to the Backlog milestone Apr 7, 2017

scabug assigned Ichoran Apr 7, 2017

SethTisue added the library:collections label Feb 17, 2018

SethTisue unassigned Ichoran Feb 17, 2018

SethTisue closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2023

SethTisue removed this from the Backlog milestone Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allegedly faster HashSet implementation #2512

Allegedly faster HashSet implementation #2512

scabug commented Oct 23, 2009

scabug commented Oct 23, 2009

scabug commented Oct 24, 2009

scabug commented Oct 25, 2009

scabug commented Oct 26, 2009

scabug commented Nov 22, 2009

scabug commented Nov 26, 2009

scabug commented Nov 26, 2009

scabug commented Nov 26, 2009

scabug commented Nov 27, 2009

scabug commented Nov 27, 2009

scabug commented Oct 11, 2010

scabug commented Feb 1, 2015

scabug commented Feb 3, 2015

scabug commented Feb 3, 2015

som-snytt commented Nov 20, 2023

Allegedly faster HashSet implementation #2512

Allegedly faster HashSet implementation #2512

Comments

scabug commented Oct 23, 2009

scabug commented Oct 23, 2009

scabug commented Oct 24, 2009

scabug commented Oct 25, 2009

scabug commented Oct 26, 2009

scabug commented Nov 22, 2009

scabug commented Nov 26, 2009

scabug commented Nov 26, 2009

scabug commented Nov 26, 2009

scabug commented Nov 27, 2009

scabug commented Nov 27, 2009

scabug commented Oct 11, 2010

scabug commented Feb 1, 2015

scabug commented Feb 3, 2015

scabug commented Feb 3, 2015

som-snytt commented Nov 20, 2023