New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashSet should implement union #6253
Comments
Imported From: https://issues.scala-lang.org/browse/SI-6253?orig=1 |
Simon (symon) said: def grade: Double = {
val setA: HashSet = // get from somewhere else
val setB: HashSet = // get from somewhere else
if ((setA size) == 0 || (setB size) == 0) return 0
else return (setA & setB size) / (setA | set B size)
} This piece of code is repeated a lot of times in a loop, and the entire loop is executed in around 4.5 sec. But when I replace the union with somethin else... def grade: Double = {
val setA: HashSet = // get from somewhere else
val setB: HashSet = // get from somewhere else
if ((setA size) == 0 || (setB size) == 0) return 0
else return (setA & setB size) / ((setA size) + (setB size)) // just a gross approximation for testing purpose
} ... then the execution time goes down to around 0.35 sec. So, I think this is a faulty behavior and should be corrected. As a quick patch, I used size(A) + size(B) - size(A intersect B), but this only works because I just want the size. |
@soc said: |
Simon (symon) said: |
@soc said: |
@rklaehn said: Unfortunately Josh Suereth told me that right now is not a good time to get this included to the scala library, since they are very busy with getting 2.10 out of the door. I managed to get a few small improvements in, but the bulk operations will have to wait. @simon: you are requiring union (|) and intersection (&). The bulk union operation would take care of making union as fast as possible. So maybe write a ticket for intersection? Regarding the eq thing: as a general rule, whenever you call a method on an immutable collection that returns an object that is equal to the original object, you should try to return the original object whenever this is possible without changing the asymptotic complexity. There are some operations where this is not possible, but for the set operations on HashSets it should always be possible! I even have a dedicated test for that in my HashSet improvements. val x1 = x.someOp(...) // if(x==x1) then x1 should be eq to x! |
@vigdorchik said: |
@rklaehn said: |
@rklaehn said: The current version is significantly faster in most cases, especially for elements with expensive equals and hashcode and obviously has much better structural sharing. But it is still slower in some degenerate cases, so I am not done yet. |
@Ichoran said: |
@rklaehn said: For example I use null as a guard value for the empty set and then convert null to HashSet.empty only in the publicly accessible methods. I also use a mutable buffer to build the new arrays of the set to be created. This is safe since the buffer is only accessed from a single thread, but it is not exactly straightforward. So all in all it is a pretty major code change. If you think there is a chance to get this into scala 2.11 I will try to find some time to integrate it. But I am very busy at work right now. |
@Ichoran said: |
@rklaehn said: |
@Ichoran said: |
@rklaehn said: |
@Ichoran said: |
@Ichoran said: |
HashSet does not implement union. That means that creating the union of two sets is very inefficient. Adding a large set to a small set creates the entire set anew.
c is a completely new set even though it contains the exact same elements as b and could therefore be eq b.
Clearly this is not good. A set should have the most efficient possible implementation for intersection and union. The implementation of filter for set (#6196) allows for a fast intersection. The missing operation union is already implemented in HashSet as merge. So all that needs to be done is to adapt the merge algorithm of HashMap to HashSet.
The text was updated successfully, but these errors were encountered: