Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing ArrayBuilder after clear() overwrites its previous result #9564

Closed
scabug opened this issue Nov 20, 2015 · 18 comments
Closed

Reusing ArrayBuilder after clear() overwrites its previous result #9564

scabug opened this issue Nov 20, 2015 · 18 comments

Comments

@scabug
Copy link

scabug commented Nov 20, 2015

In some cases ArrayBuilder.elems may still point to the previously created array after a call to ArrayBuilder.clear(). As a result, reusing such ArrayBuilder overwrites the contents of previously created array:

  val builder = new scala.collection.mutable.ArrayBuilder.ofInt
  builder ++= 1 to 16
  val array = builder.result                      //> array  : Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
  builder.clear()
  builder += 1000
  array                                           //> res2: Array[Int] = Array(1000, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)

IMHO a simple fix would be to reset the field ArrayBuilder.elems on ArrayBuilder.clear() to a new array.

@scabug
Copy link
Author

scabug commented Nov 20, 2015

Imported From: https://issues.scala-lang.org/browse/SI-9564?orig=1
Reporter: Linas Medžiūnas (linasm)
Affected Versions: 2.11.7
See #8648

@scabug
Copy link
Author

scabug commented Nov 20, 2015

@SethTisue said:
good catch!!

@scabug
Copy link
Author

scabug commented Nov 20, 2015

@SethTisue said:
note that the doc on Builder.result() says "The builder's contents are undefined after this operation", so arguably reusing the builder is simply programmer error? once result() is called, all bets are off? if so, the warning in the doc should be worded stronger, IMO

@scabug
Copy link
Author

scabug commented Nov 20, 2015

@SethTisue said:
Rex, what do you think: bug, or just inadequate doc?

@scabug
Copy link
Author

scabug commented Nov 21, 2015

@som-snytt said:
In #8648 comments, there's an expectation to clear and reuse, with similar results (shared state).

@scabug
Copy link
Author

scabug commented Nov 21, 2015

Linas Medžiūnas (linasm) said:
Yes, the doc states that "The builder's contents are undefined" after result(). But in this case it is followed by a call to clear(), which "Clears the contents of this builder." So the contents are defined (clean) after result(); clear() sequence. I guess this permits the consumer of the library to expect a reusability of ArrayBuilder.
In case you think this is not convincing, try to imagine the amount of debugging this has caused in a fairly complex graph algorithm :)

@scabug
Copy link
Author

scabug commented Nov 21, 2015

@Ichoran said:
I think this is not just inadequate documentation. It's a bug. I am a little hesitant to fix it in 2.11 since it is possible that someone depended on this behavior. It's totally safe for 2.12, and arguably buggy enough that 2.11 should have it. I'm going to let someone else (who has to deal with the consequences of it being more problematic than anticipated) make the call of where the fix should go. @SethTisue, do you want to call this one?

It's an easy fix. The harder part is to decide where to put it.

@scabug
Copy link
Author

scabug commented Jan 5, 2016

Linas Medžiūnas (linasm) said:
Let's make it happen: scala/scala#4895

@scabug
Copy link
Author

scabug commented Jan 5, 2016

@SethTisue said:
Rex, can you be more explicit about your argument for considering it a bug? Sébastien's comment on the PR seems pretty persuasive to me — there's no point in fixing “bugs” like this one at a time if there is a pervasive assumption throughout mutable collections that a builder is never reusable after calling result

@scabug
Copy link
Author

scabug commented Jan 6, 2016

@Ichoran said:
@SethTisue I suppose we could consider it a documentation bug, but it's a really pervasive documentation bug. The problem is that some mutable builders are sort of reusable after calling result, including this very builder. As long as you haven't filled the internal array everything works swimmingly:

scala> val a = Array.newBuilder[Int]
a: scala.collection.mutable.ArrayBuilder[Int] = ArrayBuilder.ofInt

scala> val b = { a += 1; a.result }
b: Array[Int] = Array(1)

scala> val c = { a.clear(); a += 2; a.result }
c: Array[Int] = Array(2)

scala> val d = b
d: Array[Int] = Array(1)

It's just the corner case where the builder returns an array of the right size (and thus doesn't copy) AND has been cleared and reused that we run into any trouble at all.

Unless making this work properly slows things down, I don't see much justification for the occasional disastrous behavior, even if we do improve the documentation to make it really clear that you'd better not reuse builders for immutable collections.

Also, that other mutable builders also give you problems if you try to do this (because they simply return themselves) is only an argument, I think, that we should document those builders very carefully. In some cases, the API isn't even honored:

scala> val a = collection.mutable.ListBuffer.newBuilder[Int]
a: scala.collection.mutable.Builder[Int,scala.collection.mutable.ListBuffer[Int]] = scala.collection.mutable.GrowingBuilder@7de752dd

scala> { a += 1; a.clear; a += 2; a.result }
res47: scala.collection.mutable.ListBuffer[Int] = ListBuffer(1, 2)

My recommendation is to make the library as sane as possible and make the documentation as clear as possible regarding possible trouble spots. Having hidden corner cases that bite you when you don't strictly adhere to a conservative reading of the docs despite apparent evidence that the desired behavior is maintained is not a nice thing to deliver to users.

@scabug
Copy link
Author

scabug commented Jan 6, 2016

@sjrd said (edited on Jan 6, 2016 8:23:37 PM UTC):
Your last example is clearly a bug of GrowingBuilder.clear(). It should definitely call elems.clear() instead of doing elems = empty. This is ridiculous, and has nothing to do with the present discussion.

IMO, the main problem is that the documentation of result(), combined with the documentation of clear(), leads to the false belief that clear() can restore a valid state after a result(). If we make it clear in the documentation of result() that the builder instance is invalidated after calling result(), and must not be further used, then the documentation of clear() cannot create the false belief anymore. The false belief exists because result() says that "the contents of the builder are not defined", instead of "the builder instance is invalid altogether". However, it is evident from the various implementations of Builder that the original intent of result() was that the builder would be invalidated altogether.

Fixing it for mutable collections will obviously turn the result() operation from O(1) to O(n). And since Array is a mutable collection, it makes no sense to fix Array but not the others. For Array, on the JVM it might not be a big deal to turn a private val into a private var, but for Scala.js this will definitely yield worse JS code, and AFAICT worse performance when the object is partially evaluated-away (which ArrayBuilder is most of the time). It will be especially bad to fix it for js.Array (whose builder is js.ArrayOps ), although I guess we could recover most if not all of the perf by using a separate class for the builder, so that we can keep js.ArrayOps's field a val.

@scabug
Copy link
Author

scabug commented Jan 6, 2016

@sjrd said:
I realize that on the JVM, ArrayBuilder has var fields. Some of my earlier comment might make sense only if you consider the completely different implementation of ArrayBuilder that is used in Scala.js: https://github.com/scala-js/scala-js/blob/master/scalalib/overrides/scala/collection/mutable/ArrayBuilder.scala#L32-L73

@scabug
Copy link
Author

scabug commented Jan 6, 2016

@Ichoran said (edited on Jan 7, 2016 1:39:04 AM UTC):
@sjrd I don't think it's clear at all that Builder is intended to be discarded after result, always and every time. First, it's absolutely unnecessary for most (maybe all?--I didn't check) immutable collections. Second, clear hardly makes sense as a method on a single-shot Builder unless you're in the habit of trying to build your collection, failing, and needing to start over again. It looks like a standard case of a single superclass trying be an intersection type instead of a union type; we really need both a Builder and ReusableBuilder, the latter of which has the clear method and a guarantee that result followed by clear results in a fresh copy.

Secondly, I'm not arguing that Builder must always be ready for a second run on clear, just that when it can be made to work 100% of the time instead of 94% of the time, it should be. And the docs should be improved at the same time.

Thirdly, the behavior of mostly-working-except-when-it-doesn't is, docs aside, a poor choice.

Now it may be the case that there's no sufficiently nonperturbative way to get out of this situation aside from changing the docs. But I don't think we should be terribly pleased by this state of affairs, and should look for better ways to handle the situation

@scabug
Copy link
Author

scabug commented Jan 8, 2016

Linas Medžiūnas (linasm) said (edited on Jan 8, 2016 8:40:03 AM UTC):
As I understand, the presence of mutable collections that are "Builders" for themselves is what is complicating the decision on this issue. I feel (and not being Scala insider, I could be completely wrong) that these collections were extended with Builder functionality in order to fit into Scala collection framework. It is difficult to imagine someone using ArrayBuffer as a Builder to build ArrayBuffer directly, in library client code. In fact, I didn't even suspect that ArrayBuffer has result() method, until Sebastien pointed this out. And documentation for it ("Produces a collection (...)") is somewhat misleading for a method that just returns "this". I guess, ArrayBuffer.result() would ideally not be exposed publicly, as its usage scope seems to be within collection library itself (maybe there are some implications for projects like scala-js as well). But that is probably not so easy to achieve.

ArrayBuilder, on the other hand, has more associations with "proper" dedicated builders like StringBuilder and VectorBuilder, that are reusable after clear(), and that is probably the reason why I have run into this issue. And Array needs a dedicated builder because, unlike other mutable collections, Array has no means for adding a new element.

So, if we could draw a line between ArrayBuilder+StringBuilder+VectorBuilder, and the mutable collections implementing Builder trait, maybe that would make the decision easier?

@scabug
Copy link
Author

scabug commented Jan 8, 2016

@sjrd said (edited on Jan 8, 2016 9:28:25 AM UTC):
ArrayBuffer, and other Buffers, are their own Builder "because they can". Builders in general are necessary for all collection types so that they fit in the CanBuildFrom design, and, by extension, all the normal collection methods such as filter or map. Since the Builder API is inherently mutable, immutable collections need separate builders, and have them. Mutable collections have abused their own class as their Builder s, which leads to this weird situation where Builder s can sometimes return this.

In a brand new design, I would fight for a different design where Builders and Collections would not be fused. In such a design, Builder would not have a clear() method at all. But we have to live with what we have now.

Secondly, I'm not arguing that Builder must always be ready for a second run on clear, just that when it can be made to work 100% of the time instead of 94% of the time, it should be. And the docs should be improved at the same time.

@rex Kerr, I guess that makes sense. Under that goal, you could argue that we should fix ArrayBuilder for working 100% of the time, but leave other mutable collections alone (those who return this), and have them keep their 0%. Is that what you're getting at?

@scabug
Copy link
Author

scabug commented Jan 8, 2016

@Ichoran said:
@sjrd - Yes, that's what I'm getting at, and change the docs to really clearly specify what behavior can be expected.

The only wildcard is how big of a performance benefit it would be for Scala.js. If it's absolutely trashes JS performance for all array creations to have the 100% guarantee, well, maybe that's too great a price to pay. I think ensured 0% is better than 94% with no warning on the 6% failures (well, not exactly 6%, really p(n = 16) + p(n = 32) + p(n = 64) + ...).

@scabug
Copy link
Author

scabug commented Jan 8, 2016

@sjrd said:
If it's just ArrayBuilder that needs fixing, the performance hit would be minimal, if existing at all. I did a small experiment with the fix. It produces a few more temporary variables in the JS code, but I think any decent JS VM should be able to collapse them back at this point.

@scabug
Copy link
Author

scabug commented Jan 15, 2016

@Ichoran said:
scala/scala#4904

@sjrd This is what I had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants