Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream.from leaks memory #692

Closed
scabug opened this issue Mar 28, 2008 · 28 comments
Closed

Stream.from leaks memory #692

scabug opened this issue Mar 28, 2008 · 28 comments
Assignees

Comments

@scabug
Copy link

scabug commented Mar 28, 2008

If Streams are meant to be lazy lists, I'd expect the following to run in constant space:

Welcome to Scala version 2.7.0-final (Java HotSpot(TM) 64-Bit Server VM, Java 1.5.0_13).
Type in expressions to have them evaluated.
Type :help for more information.

scala> for (i<- Stream.from(2)) { }
java.lang.OutOfMemoryError: Java heap space
        at scala.Stream$$cons$$.apply(Stream.scala:45)
        at scala.Stream$$.from(Stream.scala:151)
        at scala.Stream$$$$anonfun$$from$$1.apply(Stream.scala:151)
        at scala.Stream$$$$anonfun$$from$$1.apply(Stream.scala:151)
        at scala.Stream$$cons$$$$anon$$2.tail(Stream.scala:52)
        at scala.Stream$$class.loop$$6(Stream.scala:354)
        at scala.Stream$$class.foreach(Stream.scal...
scala>

This may or may not be related to #498, but here we are running out of heap space, not stack space.

@scabug
Copy link
Author

scabug commented Mar 28, 2008

Imported From: https://issues.scala-lang.org/browse/SI-692?orig=1
Reporter: Florian Hars (fhars)

@scabug
Copy link
Author

scabug commented Mar 28, 2008

@ingoem said:
Note that this is a for loop not a comprehension. You are saying "do something for every element in that stream". This won't work, since Stream.from creates an infinite stream.

@scabug
Copy link
Author

scabug commented Mar 28, 2008

@DRMacIver said:
I disagree. It's perfectly reasonable to say "do something for every element in that stream". e.g. you could represent a socket as a Stream of bytes and process that without worrying about whether it ever closes.

More importantly, this is evidence of a problem in the implementation - the head of the stream is not being allowed to be garbage collected. This basically means the socket example is unworkable because you'll retain all data ever read from it. This is sad, as this sort of lazy IO can often be really really handy. In general this will cause problems where the stream is potentially very large - the memory footprint will be much larger than it needs to be.

This might be relevant to the implementation: http://groups.google.com/group/cal_language/browse_thread/thread/728a3d4ff0f77b00

@scabug
Copy link
Author

scabug commented Mar 28, 2008

@dragos said:
Replying to [comment:2 DRMacIver]:

More importantly, this is evidence of a problem in the implementation - the head of the stream is not being allowed to be garbage collected. This basically means the socket example is unworkable because you'll retain all data ever read from it. This is sad, as this sort of lazy IO can often be really really handy. In general this will cause problems where the stream is potentially very large - the memory footprint will be much larger than it needs to be.

This might be relevant to the implementation: http://groups.google.com/group/cal_language/browse_thread/thread/728a3d4ff0f77b00

We are aware of the problem, and this has been discussed several times on the mailing list. I haven't thought about it long enough to be sure it's unsolvable, but here's a quick summary of why this is not easy to do:

{code}
for (i <- Stream.from(2)) f
{code} is translated to

Stream.from(2).foreach(f)

We'd like the GC to be able to collect memory while iterating through the stream, that means while inside foreach. That means the call stack contains foreach, from(2) and main, and main pushed a reference to the head of the stream before calling from(2). That reference is a GC root. Additionally, each cell holds a reference to the value once it has been resolved (as each cell should be evaluated at most once). That leads to keeping everything in memory. I guess switching to a more functional design (in which these are not methods on Stream cells, but static methods) might help, but we'd lose for comprehensions.

We are open for suggestions. :)

@scabug
Copy link
Author

scabug commented Mar 28, 2008

@DRMacIver said:
Right. But this is what the google groups post I linked is about. As I understand it, running a method on an object doesn't prevent that object from being GCed - what prevents it from being GCed is the fact that the method has a "this" local variable. But there's nothing in the bytecode to stop you nulling out that variable once you no longer need it.

So "all" you would need to do is change the bytecode the Scala compiler emits so that it nulls out local variables (including this!) right after their last read (or for more fine grained support you could even do it after the last read before each write)

@scabug
Copy link
Author

scabug commented Mar 28, 2008

@dragos said:
Well, it's not that simple. Apparently, variables on the stack can still be GC roots. At least that's what JProfiler tells me. There's a very simple way to test this, simply replace Stream.foreach to

  override final def foreach(f: A => Unit) {
    if (isEmpty) {}
    else { f(head); tail.foreach(f) }
  }

Tail call optimization turns the last call into a jump, and replaces the 'this' by tail. This should be even better than nulling the 'this' :) Then, the following program has a single GC root for cons objects (according to JProfiler):

object Main {
  def main(args: Array[String]) {
    var n = 0
    Stream.from(2).foreach { i =>
      n += 1
      if (n >= 5)
        while(true) {}
    }
  }
}

I'm not sure why (and maybe JProfiler is not the best tool...). I might be missing something obvious, but right now I don't see any other references to the first stream cell than the one inside the main method's stack, before calling foreach.

@scabug
Copy link
Author

scabug commented Mar 29, 2008

@DRMacIver said:
I don't see why that's a problem. The Stream object on the stack won't be the original Stream.from(2), it will be the latest tail. Or am I missing your point?

@scabug
Copy link
Author

scabug commented Mar 29, 2008

@DRMacIver said:
Additionally: are you sure TCO gets applied here? My tests show it not being. Here's some sample code to demonstrate:

package newstream;

abstract class Stream[+T]{
  def foreach(f : T => Unit) : Unit = if (empty) () else {f(head); tail.foreach(f)}
  def head : T;
  def tail : Stream[T];
  def empty : Boolean;
}

class Cons[+T](val head : T, _tail : => Stream[T]) extends Stream[T]{
  lazy val tail = _tail;
  def empty = false;
}

object Empty extends Stream[Nothing]{
  def head = error("Empty.head");
  def tail = error("Empty.tail");
  def empty = true;
}

object StreamTest extends Application{
  def from(i : Int) : Stream[Int] = new Cons[Int](i, from(i + 1));
  from(1).foreach(x => ());
}

This will stack overflow.

@scabug
Copy link
Author

scabug commented Mar 29, 2008

@DRMacIver said:
Ah, right. The final modifier is neccessary. I missed that.

However! If you change my above code to make foreach final, this code just hangs indefinitely. In particular it does not overflow the heap. So Stream can be fixed without a compiler change.

@scabug
Copy link
Author

scabug commented Mar 30, 2008

@dragos said:
Replying to [comment:6 DRMacIver]:

I don't see why that's a problem. The Stream object on the stack won't be the original Stream.from(2), it will be the latest tail. Or am I missing your point?

It is the original (chaining through tlVal all stream values). I have no other point than what JProfiler shows in the heap walker. And the only reference to that one, is "Java stack".

@scabug
Copy link
Author

scabug commented Mar 30, 2008

@dragos said:
Replying to [comment:8 DRMacIver]:

Ah, right. The final modifier is neccessary. I missed that.

However! If you change my above code to make foreach final, this code just hangs indefinitely. In particular it does not overflow the heap. So Stream can be fixed without a compiler change.

That was exactly what I was trying to achieve, glad it works at least for some. I tried both my change (directly on scala.Stream) and your smaller example, both die with OutOfMemory after aprox. 15 seconds (java 1.5.0_13).

scala -cp classes/ newstream.StreamTest
java.lang.OutOfMemoryError: Java heap space
	at scala.runtime.BoxesRunTime.boxToInteger(Unknown Source)
	at newstream.StreamTest$$.from(testStream.scala:28)
	at newstream.StreamTest$$$$anonfun$$from$$1.apply(testStream.scala:28)
	at newstream.StreamTest$$$$anonfun$$from$$1.apply(testStream.scala:28)
	at newstream.Cons.tail(testStream.scala:17)
	at newstream.Stream.foreach(testStream.scala:10)
	at newstream.StreamTest$$.<init>(testStream.scala:29)
	at newstream.StreamTest$$.<clinit>(testStream.scala)
	at newstream.StreamTest.main(testStream.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at scala.tools.nsc.ObjectRunner$$$$anonfun$$run$$1.apply(ObjectRunner.scala:75)
	at scala.tools.nsc.ObjectRunner$$.withContextClassLoader(ObjectRunner.scala:49)
	at scala.tools.nsc.ObjectRunner$$.run(ObjectRunner.scala:74)
	at scala.tools.nsc.MainGenericRunner$$.main(MainGenericRunner.scala:161)
	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
dragos-mbp:sandbox dragos$$ java -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)

@scabug
Copy link
Author

scabug commented Mar 30, 2008

@DRMacIver said:
Interesting.

After some testing, I tentatively concluded that what you're seeing happens in Java 5 with the client VM, but doesn't happen with the server VM or on either for Java 6.

@scabug
Copy link
Author

scabug commented Mar 30, 2008

@DRMacIver said:
Curiouser and curiouser. Replacing the test code with

from(1).foreach(x => System.gc())

It works in the 5 client as well.

@scabug
Copy link
Author

scabug commented Mar 30, 2008

@dragos said:
I confirm. Indeed, that's weird.

@scabug
Copy link
Author

scabug commented Mar 30, 2008

@DRMacIver said:
The problem seems to be that we're churning through heap space really quickly in this example, and the allocator is allowed to throw out of memory errors even when a GC would result in free memory. I suspect with a longer running method in the foreach the problem wouldn't crop up in the client vm either.

At any rate, this change definitely seems to be worth making. It also might be worth looking into other instances where this sort of slightly retro code for tail optimisation of calls to a different this is used.

It would be nice if we could have the compiler change too though. There are non tail call examples where this would be nice (for example if we wanted to add this behaviour to foldLeft and then define foreach in terms of foldLeft).

@scabug
Copy link
Author

scabug commented Apr 2, 2008

@dragos said:
After another frustrating day, I have a bit more information about this issue: Your code runs fine, but the same change in scala.Stream does not (crashes with OOM even on Java 6). JProfiler insisted that the first cell was referenced from the Java stack.

It looks like it was actually about local variables (probably JProfiler can't distinguish them)! Turn your Stream into a trait and watch it die... The implementation of traits turns each method into a static method that takes an explicit this object, and a forwarder which calls it. The static method iterates over the stream, and keeps no references to it's visited cells, but the forwarder does! I think we should fix the compiler to null arguments before forwarding the call.

There was also a bug in tail call elimination causing a dead variable holding on to the 'this' instance with which the method was invoked (fixed in r14481). But that seems to be ok, as long as it happens inside a long running method (the real foreach), and fails when it's inside a forwarder (as mentioned in your link, short-running methods are a problem). We could either switch Stream to be an abstract class (checked that, and it works), or wait for a fix in the compiler.

I also changed Stream.scala to use tail recursion directly, instead of local loop functions (see r14480).

@scabug
Copy link
Author

scabug commented May 9, 2008

amit kumar (cdac.amit) said:
I am facing the same problem as stated above, and it is really impossible to do further development as it is not allowing to create any more new Scala file.

If u resolved the problem then please publish that so that other developer who is using Scala can use that.

Thanks
amit

@scabug
Copy link
Author

scabug commented May 9, 2008

@dragos said:
Replying to [comment:18 cdac.amit]:

I am facing the same problem as stated above, and it is really impossible to do further development as it is not allowing to create any more new Scala file.

I don't understand. In what way is this bug stopping you from creating new Scala files?

If u resolved the problem then please publish that so that other developer who is using Scala can use that.

Of course, that goes without saying. The status is as given above, and that means the current version is still leaking. Fixing it requires a compiler change that needs some more thinking. I'll put this on the agenda of the next Scala meeting.

@scabug
Copy link
Author

scabug commented May 13, 2008

amit kumar (cdac.amit) said:
Hi,

Currently I have 68 Scala files, and all are getting complied properly, but as soon as I try to create a new Scala file, complier throws following error while compiling:

[INFO] Compiling 69 source files to ../target/classes
[WARNING] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
[WARNING] at scala.collection.mutable.FlatHashTable$$class.growTable(FlatHashTable.scala:124)
[WARNING] at scala.collection.mutable.FlatHashTable$$class.addEntry(FlatHashTable.scala:75)
[WARNING] at scala.collection.mutable.HashSet.addEntry(HashSet.scala:31)
[WARNING] at scala.collection.mutable.HashSet.$$plus$$eq(HashSet.scala:35)
[WARNING] at scala.collection.mutable.Set$$$$anonfun$$$$plus$$plus$$eq$$1.apply(Set.scala:75)
[WARNING] at scala.collection.mutable.Set$$$$anonfun$$$$plus$$plus$$eq$$1.apply(Set.scala:75)
[WARNING] at scala.Iterator$$class.foreach(Iterator.scala:376)
[WARNING] at scala.collection.mutable.ListBuffer$$$$anon$$1.foreach(ListBuffer.scala:255)
[WARNING] at scala.collection.mutable.Set$$class.$$plus$$plus$$eq(Set.scala:75)
[WARNING] at scala.collection.mutable.HashSet.$$plus$$plus$$eq(HashSet.scala:31)
[WARNING] at scala.collection.mutable.Set$$class.$$plus$$plus$$eq(Set.scala:69)
[WARNING] at scala.collection.mutable.HashSet.$$plus$$plus$$eq(HashSet.scala:31)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$Context.newBlock(GenICode.scala:1881)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.scala$$tools$$nsc$$backend$$icode$$GenICode$$ICodePhase$$$$genLoad(GenICode.scala:425)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.scala$$tools$$nsc$$backend$$icode$$GenICode$$ICodePhase$$$$genLoad(GenICode.scala:941)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.scala$$tools$$nsc$$backend$$icode$$GenICode$$ICodePhase$$$$genLoad(GenICode.scala:441)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.scala$$tools$$nsc$$backend$$icode$$GenICode$$ICodePhase$$$$genStat(GenICode.scala:181)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$$$anonfun$$genStat$$1.apply(GenICode.scala:144)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$$$anonfun$$genStat$$1.apply(GenICode.scala:143)
[WARNING] at scala.List.foreach(List.scala:753)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.genStat(GenICode.scala:143)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.scala$$tools$$nsc$$backend$$icode$$GenICode$$ICodePhase$$$$genLoad(GenICode.scala:940)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.gen(GenICode.scala:112)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$$$anonfun$$gen$$1.apply(GenICode.scala:70)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$$$anonfun$$gen$$1.apply(GenICode.scala:70)
[WARNING] at scala.List.foreach(List.scala:753)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.gen(GenICode.scala:70)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.gen(GenICode.scala:134)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase.gen(GenICode.scala:88)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$$$anonfun$$gen$$1.apply(GenICode.scala:70)
[WARNING] at scala.tools.nsc.backend.icode.GenICode$$ICodePhase$$$$anonfun$$gen$$1.apply(GenICode.scala:70)
[WARNING] at scala.List.foreach(List.scala:753)

thus I cannot create any more new file because of the "java.lang.OutOfMemoryError: Java heap space" exception.

If there is any way so that I could continue my development, then please suggest.

@scabug
Copy link
Author

scabug commented May 21, 2008

@dragos said:
That doesn't seem to be related, it's a plugin issue.

@scabug
Copy link
Author

scabug commented May 27, 2008

@dragos said:
Fixed in r15218 by making scala.Stream an abstract class.

@scabug
Copy link
Author

scabug commented Jan 14, 2009

@odersky said:
Milestone next_bugfix deleted

@scabug
Copy link
Author

scabug commented Mar 28, 2009

Florian Hars (florian) said:
There seems to be an edge case left. If you give the GC enough to work on,
everything looks fine. This seems to work in constant memory:

scala> var x = 0                                                                              
x: Int = 0

scala> def t(s: Stream[Int]) = s.map(_ => new Array[Int](10000000)).foreach(a => x = a.hashCode ^ x)         
t: (Stream[Int])Unit

scala> t(Stream.from(1))

But if you do not allocate enough, you run out of memory (this may be related to what David said above about the churn rate):

scala> var x = 0
x: Int = 0

scala> def t(s: Stream[Int]) = s.map(_ => new Array[Int](10)).foreach(a => x = a.hashCode ^ x)               
t: (Stream[Int])Unit

scala> t(Stream.from(1))                                                                      
java.lang.OutOfMemoryError: Java heap space
	at $$anonfun$$t$$1.apply(<console>:5)
	at $$anonfun$$t$$1.apply(<console>:5)
	at scala.Stream.map(Stream.scala:361)
	at scala.Stream$$$$anonfun$$map$$1.apply(Stream.scala:361)
	at scala.Stream$$$$anonfun$$map$$1.apply(Stream.scala:361)
	at scala.Stream$$cons$$$$anon$$2.tail(Stream.scala:69)
	at scala.Stream.foreach(Stream.scala:370)
	at .t(<console>:5)
	at ....

This is on 32bit i386 Linux with
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)

@scabug
Copy link
Author

scabug commented Mar 29, 2009

@SethTisue said:
Florian, I think both of your examples run out of memory. I am closing the ticket again.

Both examples run out of memory because in both cases you force the stream while retaining (in the parameter s) a reference to the head of the stream — not the mapped stream, but the original Stream[Int].

Your first example "seems" to work in constant memory because it takes a long time to allocate a large array, so that slows down the entire test. That means you have to wait a long, long time before Stream.from(1) gets the chance to eat the whole heap.

That's my hypothesis. I tested it by gradually increasing the array size. I never got all the way up to 1000000, but I did notice that each time I increased the array size, it took longer before I got the OutOfMemoryError. (With n = 100, it took 1.5 minutes; n = 200 took 2.5 minutes.)

@scabug
Copy link
Author

scabug commented Aug 17, 2009

@SethTisue said:
broken in 2.8, as of r18467

@scabug
Copy link
Author

scabug commented Aug 19, 2009

@dragos said:
Fixed by Martin in r18506.

@scabug
Copy link
Author

scabug commented Aug 21, 2009

@SethTisue said:
as of r18524, this is still broken (or maybe broken again)

@scabug
Copy link
Author

scabug commented Aug 25, 2009

@dragos said:
Fixed again in r18570.

@scabug scabug closed this as completed May 18, 2011
@scabug scabug added the critical label Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants