Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException when adding elements to a ParHashMap while iterating #9448

Closed
scabug opened this issue Aug 25, 2015 · 4 comments
Assignees

Comments

@scabug
Copy link

scabug commented Aug 25, 2015

ParHashMap non-deterministlically throws an exception, but HashMap doesn't.

scala> import scala.collection.parallel.mutable.ParHashSet
scala> val xs = ParHashSet[Int](1,2,3,4,5,6,7,8)
scala> while (true) xs.foreach {i => for (x <- 1 to 100) xs += i+x}
java.lang.ArrayIndexOutOfBoundsException: 3 at
java.lang.ArrayIndexOutOfBoundsException: 3 
  at scala.collection.mutable.FlatHashTable$class.nnSizeMapAdd(FlatHashTable.scala:256) 
  at scala.collection.parallel.mutable.ParHashSet.nnSizeMapAdd(ParHashSet.scala:37) 
  at scala.collection.mutable.FlatHashTable$class.addEntry(FlatHashTable.scala:158) 
  at scala.collection.parallel.mutable.ParHashSet.addEntry(ParHashSet.scala:37) 
  at scala.collection.mutable.FlatHashTable$class.addElem(FlatHashTable.scala:139) 
  at scala.collection.parallel.mutable.ParHashSet.addElem(ParHashSet.scala:37) 
  at scala.collection.parallel.mutable.ParHashSet.$plus$eq(ParHashSet.scala:64) 
  at $anonfun$1$$anonfun$apply$mcVI$sp$1.apply(<console>:13) 
  at $anonfun$1$$anonfun$apply$mcVI$sp$1.apply(<console>:13) 
  at scala.collection.immutable.Range.foreach(Range.scala:166) 
  at $anonfun$1.apply$mcVI$sp(<console>:13) 
  at $anonfun$1.apply(<console>:13) 
  at $anonfun$1.apply(<console>:13) 
  at scala.collection.Iterator$class.foreach(Iterator.scala:742) 
  at scala.collection.parallel.mutable.ParFlatHashTable$ParFlatHashTableIterator.foreach(ParFlatHashTable.scala:27) 
  at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972) 
  at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49) 
  at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48) 
  at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48) 
  at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51) 
  at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969) 
  at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152) 
  at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443) 
  at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) 
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

The "while (true)" is to try repeatedly until it fails, as the failure is non-deterministic.

@scabug
Copy link
Author

scabug commented Aug 25, 2015

Imported From: https://issues.scala-lang.org/browse/SI-9448?orig=1
Reporter: Samuel Gélineau (gelisam)
Affected Versions: 2.11.7

@scabug
Copy link
Author

scabug commented Aug 26, 2015

@retronym said:
Behaviour of Scala collections when mutating during iteration is undefined. See this comment for an excellent overview of this issue, and how it contrasts with Java collections (which fail fast):

https://issues.scala-lang.org/browse/SI-8774?focusedCommentId=72225&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-72225

/cc @Ichoran to double check my reasoning here.

@scabug scabug closed this as completed Aug 26, 2015
@scabug
Copy link
Author

scabug commented Aug 26, 2015

Samuel Gélineau (gelisam) said (edited on Aug 26, 2015 2:02:16 AM UTC):

Behaviour of Scala collections when mutating during iteration is undefined

Ah, my mistake then. Is this documented anywhere? I assumed that mutating collections during iteration was allowed in Scala because before filing the bug,

  1. I looked at the documentation for ParHashMap.+=, the documentation for ParHashMap.foreach, the top-level documentation for the ParHashMap class, and I did not see any counter-indication.
  2. I googled whether Scala allowed it and I could not find any counter-indication.
  3. I tried a couple of simple snippets and was pleasantly surprised to see that the behaviour seemed deterministic: elements added during the iteration were never iterated over, but they did appear in the collection after the iteration was over. I assumed Scala was creating a temporary immutable version for the purpose of iterating safely or something like that.

Note that even if I had found the sentence "dealing with mutable collection means you need to understand which code changes which collection when" in the Collections overview, I would have interpreted this as meaning that I need to understand which part of my code changes a collection, that is, that I'll experience the usual pain of imperative programming based on shared state.

@scabug
Copy link
Author

scabug commented Aug 26, 2015

@Ichoran said:
[~gelisam] @retronym I agree with Jason's reasoning. Java's use cases anticipate a lot of use and mutation of collections. While they often do a pretty good job of failing fast, even Java collections only promise to do so on a best-effort basis in most cases.

I think the general principle for mutable collections in Scala is, as Jason says, that mutation cannot safely occur during traversal (whether you have an iterator or no).

It would be great for the docs to be much clearer about what is and is not okay. Intuitively it probably seems kind of sketchy to use transform inside transform, so people probably don't do it, but simple traversal and addition? It often seems okay in the simplest cases, but breaks when things get more involved. For example, consider

val m = collection.mutable.HashSet((1 to 10): _*)
m.foreach{ i => m += i+100; m += i+200; m += i+300; m -= i+1 }

If you try simpler variations of this, it looks like it magically all works: e.g. if you just have the removal, every second of the original indices is removed. But with this you get gobbledygook, as the foreach is just running down the original array, and slots appear and disappear willy-nilly, and then the array is resized to be larger partway through which isn't reflected in the foreach!

So, bottom line is: good documentation for what happens would be awesome. But don't put mutation inside of map & friends, whether the collections are parallel or sequential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants