New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream.from leaks memory #692
Comments
Imported From: https://issues.scala-lang.org/browse/SI-692?orig=1 |
@ingoem said: |
@DRMacIver said: More importantly, this is evidence of a problem in the implementation - the head of the stream is not being allowed to be garbage collected. This basically means the socket example is unworkable because you'll retain all data ever read from it. This is sad, as this sort of lazy IO can often be really really handy. In general this will cause problems where the stream is potentially very large - the memory footprint will be much larger than it needs to be. This might be relevant to the implementation: http://groups.google.com/group/cal_language/browse_thread/thread/728a3d4ff0f77b00 |
@dragos said:
We are aware of the problem, and this has been discussed several times on the mailing list. I haven't thought about it long enough to be sure it's unsolvable, but here's a quick summary of why this is not easy to do: {code} Stream.from(2).foreach(f) We'd like the GC to be able to collect memory while iterating through the stream, that means while inside We are open for suggestions. :) |
@DRMacIver said: So "all" you would need to do is change the bytecode the Scala compiler emits so that it nulls out local variables (including this!) right after their last read (or for more fine grained support you could even do it after the last read before each write) |
@dragos said: override final def foreach(f: A => Unit) {
if (isEmpty) {}
else { f(head); tail.foreach(f) }
} Tail call optimization turns the last call into a jump, and replaces the 'this' by object Main {
def main(args: Array[String]) {
var n = 0
Stream.from(2).foreach { i =>
n += 1
if (n >= 5)
while(true) {}
}
}
} I'm not sure why (and maybe JProfiler is not the best tool...). I might be missing something obvious, but right now I don't see any other references to the first stream cell than the one inside the |
@DRMacIver said: |
@DRMacIver said: package newstream;
abstract class Stream[+T]{
def foreach(f : T => Unit) : Unit = if (empty) () else {f(head); tail.foreach(f)}
def head : T;
def tail : Stream[T];
def empty : Boolean;
}
class Cons[+T](val head : T, _tail : => Stream[T]) extends Stream[T]{
lazy val tail = _tail;
def empty = false;
}
object Empty extends Stream[Nothing]{
def head = error("Empty.head");
def tail = error("Empty.tail");
def empty = true;
}
object StreamTest extends Application{
def from(i : Int) : Stream[Int] = new Cons[Int](i, from(i + 1));
from(1).foreach(x => ());
} This will stack overflow. |
@DRMacIver said: However! If you change my above code to make foreach final, this code just hangs indefinitely. In particular it does not overflow the heap. So Stream can be fixed without a compiler change. |
@dragos said:
It is the original (chaining through |
@dragos said:
That was exactly what I was trying to achieve, glad it works at least for some. I tried both my change (directly on scala -cp classes/ newstream.StreamTest
java.lang.OutOfMemoryError: Java heap space
at scala.runtime.BoxesRunTime.boxToInteger(Unknown Source)
at newstream.StreamTest$$.from(testStream.scala:28)
at newstream.StreamTest$$$$anonfun$$from$$1.apply(testStream.scala:28)
at newstream.StreamTest$$$$anonfun$$from$$1.apply(testStream.scala:28)
at newstream.Cons.tail(testStream.scala:17)
at newstream.Stream.foreach(testStream.scala:10)
at newstream.StreamTest$$.<init>(testStream.scala:29)
at newstream.StreamTest$$.<clinit>(testStream.scala)
at newstream.StreamTest.main(testStream.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at scala.tools.nsc.ObjectRunner$$$$anonfun$$run$$1.apply(ObjectRunner.scala:75)
at scala.tools.nsc.ObjectRunner$$.withContextClassLoader(ObjectRunner.scala:49)
at scala.tools.nsc.ObjectRunner$$.run(ObjectRunner.scala:74)
at scala.tools.nsc.MainGenericRunner$$.main(MainGenericRunner.scala:161)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
dragos-mbp:sandbox dragos$$ java -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing) |
@DRMacIver said: After some testing, I tentatively concluded that what you're seeing happens in Java 5 with the client VM, but doesn't happen with the server VM or on either for Java 6. |
@DRMacIver said: from(1).foreach(x => System.gc()) It works in the 5 client as well. |
@dragos said: |
@DRMacIver said: At any rate, this change definitely seems to be worth making. It also might be worth looking into other instances where this sort of slightly retro code for tail optimisation of calls to a different this is used. It would be nice if we could have the compiler change too though. There are non tail call examples where this would be nice (for example if we wanted to add this behaviour to foldLeft and then define foreach in terms of foldLeft). |
@dragos said: It looks like it was actually about local variables (probably JProfiler can't distinguish them)! Turn your There was also a bug in tail call elimination causing a dead variable holding on to the 'this' instance with which the method was invoked (fixed in r14481). But that seems to be ok, as long as it happens inside a long running method (the real I also changed |
amit kumar (cdac.amit) said: If u resolved the problem then please publish that so that other developer who is using Scala can use that. Thanks |
@dragos said:
I don't understand. In what way is this bug stopping you from creating new Scala files?
Of course, that goes without saying. The status is as given above, and that means the current version is still leaking. Fixing it requires a compiler change that needs some more thinking. I'll put this on the agenda of the next Scala meeting. |
amit kumar (cdac.amit) said: Currently I have 68 Scala files, and all are getting complied properly, but as soon as I try to create a new Scala file, complier throws following error while compiling: [INFO] Compiling 69 source files to ../target/classes thus I cannot create any more new file because of the "java.lang.OutOfMemoryError: Java heap space" exception. If there is any way so that I could continue my development, then please suggest. |
@dragos said: |
@dragos said: |
@odersky said: |
Florian Hars (florian) said: scala> var x = 0
x: Int = 0
scala> def t(s: Stream[Int]) = s.map(_ => new Array[Int](10000000)).foreach(a => x = a.hashCode ^ x)
t: (Stream[Int])Unit
scala> t(Stream.from(1)) But if you do not allocate enough, you run out of memory (this may be related to what David said above about the churn rate): scala> var x = 0
x: Int = 0
scala> def t(s: Stream[Int]) = s.map(_ => new Array[Int](10)).foreach(a => x = a.hashCode ^ x)
t: (Stream[Int])Unit
scala> t(Stream.from(1))
java.lang.OutOfMemoryError: Java heap space
at $$anonfun$$t$$1.apply(<console>:5)
at $$anonfun$$t$$1.apply(<console>:5)
at scala.Stream.map(Stream.scala:361)
at scala.Stream$$$$anonfun$$map$$1.apply(Stream.scala:361)
at scala.Stream$$$$anonfun$$map$$1.apply(Stream.scala:361)
at scala.Stream$$cons$$$$anon$$2.tail(Stream.scala:69)
at scala.Stream.foreach(Stream.scala:370)
at .t(<console>:5)
at .... This is on 32bit i386 Linux with |
@SethTisue said: Both examples run out of memory because in both cases you force the stream while retaining (in the parameter s) a reference to the head of the stream — not the mapped stream, but the original Stream[Int]. Your first example "seems" to work in constant memory because it takes a long time to allocate a large array, so that slows down the entire test. That means you have to wait a long, long time before Stream.from(1) gets the chance to eat the whole heap. That's my hypothesis. I tested it by gradually increasing the array size. I never got all the way up to 1000000, but I did notice that each time I increased the array size, it took longer before I got the OutOfMemoryError. (With n = 100, it took 1.5 minutes; n = 200 took 2.5 minutes.) |
@SethTisue said: |
@dragos said: |
@SethTisue said: |
@dragos said: |
If Streams are meant to be lazy lists, I'd expect the following to run in constant space:
This may or may not be related to #498, but here we are running out of heap space, not stack space.
The text was updated successfully, but these errors were encountered: