Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream flatMap leaks memory if mapper returns chain of empty results #6409

Closed
scabug opened this issue Sep 20, 2012 · 9 comments
Closed

Stream flatMap leaks memory if mapper returns chain of empty results #6409

scabug opened this issue Sep 20, 2012 · 9 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Sep 20, 2012

The reference to the original stream head is not released until the mapper function produces a non-empty stream. This can be a problem if the elements consume considerable memory, and the mapper returns a long chain of empy streams.

scala> def run[A](s: Stream[A]) = s.flatMap(_ => Stream.empty[Int])
run: [A](s: Stream[A])scala.collection.immutable.Stream[Int]

scala> def bigElemts = Stream.range(1, 1000000).map(x => List.fill(1000)("big"))
bigElemts: scala.collection.immutable.Stream[List[java.lang.String]]

scala> run(bigElemts)
java.lang.OutOfMemoryError: Java heap space

With Stream's head not being lazy, I'm not sure this can be solved. But at least would be nice to include a hint in the flatMap apidocs.

@scabug
Copy link
Author

scabug commented Sep 20, 2012

Imported From: https://issues.scala-lang.org/browse/SI-6409?orig=1
Reporter: Robin Palotai (ron)

@scabug
Copy link
Author

scabug commented Sep 24, 2012

Robin Palotai (ron) said:
Note also if you are concerned that the above run method holds to the stream head:

scala> @scala.annotation.tailrec def len[A](s: Stream[A], acc: Int = 0): Int = if (s.isEmpty) acc else len(s.tail, acc + 1)
len: [A](s: Stream[A], acc: Int)Int

scala> len(bigElemts)
res3: Int = 999999

scala> len(bigElemts.flatMap(_ => Stream.empty[Int]))
   ... OOM

So it is flatMap holding to the head.

@scabug
Copy link
Author

scabug commented Sep 25, 2012

@kmizu said (edited on Sep 25, 2012 2:21:45 AM UTC):
Your example code doesn't indicate that the above run method doesn't holds to the stream head because len method is a tail recursive and the tail recursive method jumps to beginning of the method and rewrite the corresponding parameter s when the method itself is called.

However, my previous comment was also wrong. The reason of this problem is not that the run method holds to the stream head.
The reason of this problems is that flatMap continues to evaluate each thunk of Stream until non-empty sequence is produced.

In len(bigElemts), because s is rewritten to its tail when calling len(s.tail, acc + 1), memory leaks doesn't happen.
In len(bigElemts.flatMap(_ => Stream.empty[Int])), all thunks (LIst.fill(1000)("big")) of bigElemts, the number of bigElemts is 1000000, is evaluated to big List. Then, much memory is consumed. The head of bigElemts cannot be released in the flatMap because bigElemts itself is this reference, which is not rewritable.

@scabug
Copy link
Author

scabug commented Sep 25, 2012

Robin Palotai (ron) said:
Kota: Exactly that is my problem. While the flatMap API doc mentions that just like map, flatMap is lazy. One would naively assume (without a deeper understanding of Scala's Stream and its flatMap) that flatMap behaves well memory-wise too.

However when the flatMap's mapper yields a long chain of empty Streams, this behavior does not hold, which is contrary to the (naive and wrong) expectations. An API doc entry would save many hours of debugging (especially in a setup where it is less trivial which Stream transform step is the cause of the leak).

By the way, yielding a long run of empty streams is not artificial (I know you were not stating this, just expressing my opinion), it happened in a live dataset.

@scabug
Copy link
Author

scabug commented Sep 25, 2012

@jsuereth said:
scala/scala#1397

Not a fix, but documentation for now. Workaround may be more extensive. Will look into it for 2.10.1.

@scabug
Copy link
Author

scabug commented Jan 28, 2014

@adriaanm said:
I seem to remember a related ticket/PR, but couldn't find it.

@scabug
Copy link
Author

scabug commented May 30, 2014

@Ichoran said:
I don't believe that this problem is actually caused by not having a lazy head. Instead, I think this is exactly what you ask for when you use a Stream: intermediate portions of the stream are kept in memory. Stream actually conflates two different ideas: "I want to be able to iterate from this point over and over again", and "I want to store the results of any computations I do in the future". Stream is the latter (and given the latter it can do the former). Asking to fix this bug is asking to make Stream the former only.

Stream can't know when you call flatMap that you're doing so in a context that allows it to release the head and thus that it shouldn't memoize the results. You've got the same issue with dropWhile or anything else.

So, closing this as wontfix. If someone wants to open a feature request for an immutable collection that is lazy, re-traversable, and non-memoizing (a halfway point between Iterator and Stream), feel free. Otherwise, try to use Iterator in cases like these.

@scabug
Copy link
Author

scabug commented May 30, 2014

@Ichoran said:
This cannot be fixed due to the type of collection that Stream is. It is not really advisable to document every single method that memoizes to remind the user that Stream memoizes. It's just how Stream works.

@scabug scabug closed this as completed May 30, 2014
@scabug
Copy link
Author

scabug commented May 30, 2014

@Ichoran said:
Improved docs, hopefully: scala/scala#3805

@scabug scabug added this to the 2.11.2 milestone Apr 7, 2017
@scala scala deleted a comment from scabug May 4, 2018
@scala scala deleted a comment from scabug May 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants