Scala Programming Language
  1. Scala Programming Language
  2. SI-4835

scala.collection.immutable.StreamIterator[+A] has O(n) memory consumption

    Details

      Description

      In the following program using Stream#iterator, OutOfMemoryError should not occur
      Because no references to Stream's head are not hold and Stream was converted to Iterator
      using Stream[+A]#iterator.

      object StreamIteratorProblem {
      def main(args: Array[String]) {
      val infiniteIterator = Stream.from(1).iterator
      for(i <- 1 to 10000000)

      { val e = infiniteIterator.next if(e % 1000000 == 0) println(e) }

      }
      }

      However, When this program runs, OutOfMemoryError occurs as the following:

      >scala StreamIteratorProblem
      1000000
      2000000
      3000000
      4000000
      java.lang.OutOfMemoryError: Java heap space
      at scala.collection.immutable.Stream$.from(Stream.scala:657)
      at scala.collection.immutable.Stream$$anonfun$from$1.apply(Stream.scala:657)
      at scala.collection.immutable.Stream$$anonfun$from$1.apply(Stream.scala:657)
      at scala.collection.immutable.Stream$Cons.tail(Stream.scala:630)
      at scala.collection.immutable.Stream$Cons.tail(Stream.scala:622)

      I found that there exists the reason of this problem in `StreamIterator` in "Stream.scala`.
      In the definition of StreamIterator as the followings,

      class StreamIterator[+A](self: Stream[A]) extends Iterator[A] {
      // A call-by-need cell.
      class LazyCell(st: => Stream[A])

      { lazy val v = st }

      }

      an argument `self` of primary constructor continues to hold the reference to a Stream.
      Then, whenever StreamIterator#next is called, memory consumption increases.

      I created a patch to this problem and will attach it. It seems works well.

      1. StreamIterator.patch
        1 kB
        Kota Mizushima
      2. StreamIterator.patch.2
        0.9 kB
        Kota Mizushima
      3. StreamIteratorProblem.scala
        0.2 kB
        Kota Mizushima

        Activity

        Hide
        Daniel Sobral added a comment -

        This patch breaks the lazyness of stream for the initial element. I think that isn't a big problem, though effects of that can be subtle. Assigning null to self shouldn't be necessary – if you don't use it in a method, it shouldn't become part of the object.

        I'm surprised, however, that "val stream" inside the definition of "var these" doesn't suffer from the same problem – even though it is in a small scope, it ought to be saved so that it is preserved when the by-name LazyCell gets invoked. It might even be a bug here.

        Show
        Daniel Sobral added a comment - This patch breaks the lazyness of stream for the initial element. I think that isn't a big problem, though effects of that can be subtle. Assigning null to self shouldn't be necessary – if you don't use it in a method, it shouldn't become part of the object. I'm surprised, however, that "val stream" inside the definition of "var these" doesn't suffer from the same problem – even though it is in a small scope, it ought to be saved so that it is preserved when the by-name LazyCell gets invoked. It might even be a bug here.
        Hide
        Kota Mizushima added a comment -

        Hello. Thanks for commenting to my patch. However, I couldn't understand that my patch breaks the lazyness of stream for the initial element. If you don't mind, I would like to see concrete example of it.

        > I think that isn't a big problem, though effects of that can be subtle

        I think that it isn't big problem, too. However, I think that Iterator#next is not to expected consume O memory generally.

        > Assigning null to self shouldn't be necessary – if you don't use it in a method, it shouldn't become part of the object.

        Because primary constructor's arguments `self` continues to exist until ths object is garbage collected, assigning null to self is needed now if StreamIterator's API shouldn't be changed.

        Thanks again.

        Kota Mizushima

        Show
        Kota Mizushima added a comment - Hello. Thanks for commenting to my patch. However, I couldn't understand that my patch breaks the lazyness of stream for the initial element. If you don't mind, I would like to see concrete example of it. > I think that isn't a big problem, though effects of that can be subtle I think that it isn't big problem, too. However, I think that Iterator#next is not to expected consume O memory generally. > Assigning null to self shouldn't be necessary – if you don't use it in a method, it shouldn't become part of the object. Because primary constructor's arguments `self` continues to exist until ths object is garbage collected, assigning null to self is needed now if StreamIterator's API shouldn't be changed. Thanks again. Kota Mizushima
        Hide
        Daniel Sobral added a comment -

        Indeed, the first element isn't lazy at all, as it is passed by value. So that shouldn't be a problem.

        As for self, it * shouldn't* continue to exist. It only did so because it was used in a by-name parameter.

        Show
        Daniel Sobral added a comment - Indeed, the first element isn't lazy at all, as it is passed by value. So that shouldn't be a problem. As for self, it * shouldn't* continue to exist. It only did so because it was used in a by-name parameter.
        Hide
        Kota Mizushima added a comment -

        > Indeed, the first element isn't lazy at all, as it is passed by value. So that shouldn't be a problem.

        I see that because argument self: Stream itself is passed by value, then the first element wasn't lazy
        without my patch. Is is right?

        > As for self, it * shouldn't* continue to exist. It only did so because it was used in a by-name parameter.

        OK. I created another version of patch and will attach it as StreamIterator.patch.2. By applying this patch,
        no longer, unneeded field `self` is erased.

        Show
        Kota Mizushima added a comment - > Indeed, the first element isn't lazy at all, as it is passed by value. So that shouldn't be a problem. I see that because argument self: Stream itself is passed by value, then the first element wasn't lazy without my patch. Is is right? > As for self, it * shouldn't* continue to exist. It only did so because it was used in a by-name parameter. OK. I created another version of patch and will attach it as StreamIterator.patch.2. By applying this patch, no longer, unneeded field `self` is erased.
        Hide
        huynhjl added a comment -

        Note that this used to run in constant memory in 2.8.1 and most likely in 2.8.0 as well. Something happened in the 2.9.0 release.

        Show
        huynhjl added a comment - Note that this used to run in constant memory in 2.8.1 and most likely in 2.8.0 as well. Something happened in the 2.9.0 release.
        Hide
        Kota Mizushima added a comment -

        Several days ago, I sent pull request fixing this isssue on git://github.com/scala/scala.git and the pull request was merged into master branch:

        commit 39a837c8357e78c949613db881bd060d176f3279
        Author: Kota Mizushima <mizukota@gmail.com>
        Date: Mon Jan 23 18:53:05 2012 +0900

        Add test case for SI-4835 (https://issues.scala-lang.org/browse/SI-4835)

        This test case only confirm that StreamIterator's lazyiness is not broken.
        Test case about memory consumption could be created. However, such a
        test cause a greatly increased time of test.

        I'm not a commiter. However, because the fix was merged into master branch, I've thought that it should be noted.

        Show
        Kota Mizushima added a comment - Several days ago, I sent pull request fixing this isssue on git://github.com/scala/scala.git and the pull request was merged into master branch: commit 39a837c8357e78c949613db881bd060d176f3279 Author: Kota Mizushima <mizukota@gmail.com> Date: Mon Jan 23 18:53:05 2012 +0900 Add test case for SI-4835 ( https://issues.scala-lang.org/browse/SI-4835 ) This test case only confirm that StreamIterator's lazyiness is not broken. Test case about memory consumption could be created. However, such a test cause a greatly increased time of test. I'm not a commiter. However, because the fix was merged into master branch, I've thought that it should be noted.
        Hide
        Jed Wesley-Smith added a comment -

        this is tagged as a fix in the 2.9.2 release notes but is unresolved here. What is the source of truth?

        Show
        Jed Wesley-Smith added a comment - this is tagged as a fix in the 2.9.2 release notes but is unresolved here. What is the source of truth?
        Hide
        Kota Mizushima added a comment -

        As far as I see https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/immutable/Stream.scala, my fix is included in 2.9.2. However, because I'm just another contributor, I don't know whether or not I have permission to close an Issue myself.

        Show
        Kota Mizushima added a comment - As far as I see https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/immutable/Stream.scala , my fix is included in 2.9.2. However, because I'm just another contributor, I don't know whether or not I have permission to close an Issue myself.

          People

          • Assignee:
            Kota Mizushima
            Reporter:
            Kota Mizushima
          • Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development