New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider replacing existing mutable HashMap implementation with OpenHashMap, which seems faster #5263
Comments
Imported From: https://issues.scala-lang.org/browse/SI-5263?orig=1 |
@paulp said: As-is: 7 minutes 5 seconds So this is wontfix in the absence of a significantly more convincing argument. |
@pchiusano said: It looks like Ismael has already done some testing - Ismael - can you explain what the numbers in the table mean in the link you gave? You did not actually explain that anywhere in your message. :) Either OpenHashMap was significantly worse for this benchmark, or significantly better... Besides that randomized search test what other tests did you do and can you summarize your general results? |
@ijuma said: |
@pchiusano said: |
@ijuma said: |
@paulp said: |
@pchiusano said: Paul - I don't think I am the only person interested in the performance of fundamental collections like hash maps. IMO, the recent reactions to the Yammer leaked email show that lots of people are interested in it - people want Scala's collections to be competitive with Java's for the use cases they care about. And since this is such a fundamental performance issue for so many people, I would have thought that Typesafe would be interested in profiling and optimizing these sorts of things even without someone from the community spearheading. But that is your call obviously. :) I also understand if you guys are really busy with other stuff - though if that is the case, maybe this ticket could be turned into a general "get Scala's mutable collections to be more competitive with Java's" ticket and put on the backlog or something.
Here's my argument, and feel free to disagree :) - there is a lot of Scala code out there, including the compiler itself. As a straw man, if 99% of the Scala code out there would see a dramatic performance improvement as a result of some change, but the compiler itself would see a 10% decrease in performance, then for the good of the overall Scala community such a change might be warranted. Basically, the Scala compiler is not necessarily representative of the "average" codebase of the rest of the community of Scala users. Also, what if the compiler's slowdown is explained by something dumb like - the compiler uses lots of small maps, and the default initial size for AlternateMapImplementation is too high? If something like this were the explanation, it could have an easy fix, and then both the scala compiler and the rest of the community could see a speed improvement. Paul, btw, I'm actually really curious what performance difference you see in the compiler if you replace the map implementation with one that delegates to java.util.Map. Anyway feel free to close the ticket - I just, in general, don't like it when tickets get closed with zero discussion or response from the original submitter. Even if you think it's a bad bug report or whatever, I just find it sort of off-putting. |
@paulp said: I think about little but performance issues these days, but a ticket which says "make mutable collections faster" is not useful, it only contributes to one of my biggest problems, that being the amount of noise in the bug database. When I have hundreds of tickets to select from, how helpful do you think it is to have additional ones opened up with vague requirements regarding subjects which we hear about every day whether we want to or not? So if it seems a bit off-putting then that's probably intentional. This should not be read to discourage the opening of specific, fully articulated issues. It should be read to mean that a significant fraction of new tickets contribute a net negative by draining my very limited time. |
@pchiusano said: Just as a idea, to deal with the "noise in the bug database", maybe there could be a designation of "additional info requested" or something like that for issues? Then when you receive an issue that you don't think is really actionable as reported, you designate it "additional info requested", add a comment about what other info is needed, and even assign it back to the submitter. So the issue stays alive but you don't have to look at it all the time, and there's a clear next step for the submitter to push things along. This might be better than just closing the ticket. Anyway, just a thought. Feel free to close this ticket! |
@paulp said: Because JIRA is a steaming pile of garbage, I can't assign anything to the submitter without firing up the admin interface and adding them to the list of "scala developers", which isn't going to happen. (The alternative, that tickets can be assigned to everyone, means that I get a dropdown list of 4000 people every time I want to assign a ticket. There is apparently no middle ground.) |
@ijuma said: "As-is: 7 minutes 5 seconds Is this as simple as replacing mutable.HashMap with my implementation and running ant or is it a more involved process? If the latter, would you please outline it so that I can make sure we improve there too (or at least don't regress)? Thanks! |
@paulp said: |
@ijuma said: I did it a bit differently than what you suggest because I didn't know about the locker.done command, but, if I understand correctly, this should not be an issue because creating the locker which should be the same for both invocations (and it was). I did run ant all.clean before each invocation. What's the best way to proceed. File a separate issue or add details to this one? |
@paulp said: |
@Ichoran said: |
@ijuma said: |
@Ichoran said: |
@JamesIry said: |
@Ichoran said: |
@Ichoran said: |
collection.mutable.OpenHashMap seems to beat collection.mutable.HashMap in performance. I posted a microbenchmark for just insertion here: https://gist.github.com/1423303. I've also found in informal testing that OpenHashMap is faster.
OpenHashMap could replace the existing implementation, after doing more extensive profiling to see if/when the existing implementation is faster.
The text was updated successfully, but these errors were encountered: