Scala Programming Language
  1. Scala Programming Language
  2. SI-2239

Large mapped streams cause OutOfMemoryError when using reduceLeft.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Misc Library
    • Labels:
      None

      Description

      On my machine this runs for about 40 seconds and then the error occurs. I've tried this on both the nightly 2.8 build, 2.7.3final (Ubuntu) and whatever Netbeans is bringing along with it. There doesn't seem to be any reason for this that has become apparent to me from looking at the Scala code underlying the related methods.

      def nextNumber(number: Long): Long =
      {
          if (number % 2 == 0)
          {
              return number / 2;
          }
          else
          {
              return (number * 3) + 1;
          }
      }
      
      def numberStream(number: Long): Stream[Long] =
      {
          if (number == 1)
          {
              return Stream.cons(1, Stream.empty);
          }
          return Stream.cons(number, numberStream(nextNumber(number)));
      }
      
      def numberRangeStream(number: Long): Stream[Long] =
      {
          if (number == 999999L)
          {
              return Stream.cons(999999L, Stream.empty);
          }
          return Stream.cons(number, numberRangeStream(number + 2));
      }
      
      val maxValue = numberRangeStream(500001).map(numberStream(_)).reduceLeft((left, right) => (
       if (left.length > right.length) left else right
       )).first;
      println("Largest value = " + maxValue);
      

        Activity

        Hide
        Philipp Haller added a comment -

        Note that already the following program throws an `OutOfMemoryError` if the assignment `mapped = null` is commented out as shown:

        object Test extends Application {
        
          def nextNumber(number: Long): Long =
            if (number % 2 == 0)
              number / 2
            else
              (number * 3) + 1
        
          def numberStream(number: Long): Stream[Long] =
            if (number == 1)
              Stream.cons(1, Stream.empty)
            else
              Stream.cons(number, numberStream(nextNumber(number)))
        
          def numberRangeStream(high: Long, number: Long): Stream[Long] =
            if (number == high)
              Stream.cons(high, Stream.empty)
            else
              Stream.cons(number, numberRangeStream(high, number + 2))
        
          val low = 500001
          val high = 999999L
          var mapped = numberRangeStream(high, low).map(numberStream(_))
          val s = mapped.size
          println("size of mapped stream: "+s)
        
          var i = 0
          var curr = mapped
          //mapped = null
          var el = curr.head
          while (i < s) {
            i += 1
            el = curr.head
            print(el.size+" ")
            curr = curr.tail
          }
        }
        

        Therefore, no matter how `reduceLeft` is implemented (and currently it is implemented iteratively), an `OutOfMemoryError` is thrown, since the reference to the head of the stream is enough to retain all the contained streams. This means the reduction has to be implemented in a custom way without retaining references to sub-streams that have already been processed.

        Therefore, I am closing this ticket, since it has apparently nothing to do with `reduceLeft`, or any other method of `Stream` used in the example.

        Show
        Philipp Haller added a comment - Note that already the following program throws an `OutOfMemoryError` if the assignment `mapped = null` is commented out as shown: object Test extends Application { def nextNumber(number: Long): Long = if (number % 2 == 0) number / 2 else (number * 3) + 1 def numberStream(number: Long): Stream[Long] = if (number == 1) Stream.cons(1, Stream.empty) else Stream.cons(number, numberStream(nextNumber(number))) def numberRangeStream(high: Long, number: Long): Stream[Long] = if (number == high) Stream.cons(high, Stream.empty) else Stream.cons(number, numberRangeStream(high, number + 2)) val low = 500001 val high = 999999L var mapped = numberRangeStream(high, low).map(numberStream(_)) val s = mapped.size println("size of mapped stream: "+s) var i = 0 var curr = mapped //mapped = null var el = curr.head while (i < s) { i += 1 el = curr.head print(el.size+" ") curr = curr.tail } } Therefore, no matter how `reduceLeft` is implemented (and currently it is implemented iteratively), an `OutOfMemoryError` is thrown, since the reference to the head of the stream is enough to retain all the contained streams. This means the reduction has to be implemented in a custom way without retaining references to sub-streams that have already been processed. Therefore, I am closing this ticket, since it has apparently nothing to do with `reduceLeft`, or any other method of `Stream` used in the example.
        Hide
        Seth Tisue added a comment -

        what about this?

        Stream.const(1).reduceLeft(_ max _)
        

        it seems to me that it should run in constant space, but it runs out of memory on both 2.7 and trunk (r18571). if SI-692 is fixable, isn't this fixable too?

        Show
        Seth Tisue added a comment - what about this? Stream.const(1).reduceLeft(_ max _) it seems to me that it should run in constant space, but it runs out of memory on both 2.7 and trunk (r18571). if SI-692 is fixable, isn't this fixable too?
        Hide
        Hans-Peter Strr added a comment -

        I ran probably into the same problem. The simplest way to reproduce it are these two lines:

        (1 to 10000000).toStream.foldLeft(0)(_+_)
        (1 to 10000000).toStream.reduceLeft(_+_)
        

        Both are semantically equivalent. The first one runs correctly, and the second one throws an OutOfMemoryError.

        The reason for this is that reduceLeft calls foldLeft for the actual work. foldLeft it itself allows the garbage collection of the already processed part of the stream because it discards the reference to the original head of the stream by using tailrecursion. On the other hand, the call to reduceLeft is forwarded to foldLeft, and thus the stackframe of reduceLeft holds a reference to the original head of the stream during the full time of the call, such that nothing can be garbage collected.

        I suppose the only way to fix this is some code duplication: if reduceLeft does it's own work instead of forwarding the call, this is solved.

        I would guess this applies to other methods of Stream as well, rendering it a little bit dangerous to use - compare http://stackoverflow.com/questions/4132924/functional-processing-of-scala-streams-without-outofmemory-errors/4134541SI-4134541 this comment. 8-}

        Show
        Hans-Peter Strr added a comment - I ran probably into the same problem. The simplest way to reproduce it are these two lines: (1 to 10000000).toStream.foldLeft(0)(_+_) (1 to 10000000).toStream.reduceLeft(_+_) Both are semantically equivalent. The first one runs correctly, and the second one throws an OutOfMemoryError. The reason for this is that reduceLeft calls foldLeft for the actual work. foldLeft it itself allows the garbage collection of the already processed part of the stream because it discards the reference to the original head of the stream by using tailrecursion. On the other hand, the call to reduceLeft is forwarded to foldLeft, and thus the stackframe of reduceLeft holds a reference to the original head of the stream during the full time of the call, such that nothing can be garbage collected. I suppose the only way to fix this is some code duplication: if reduceLeft does it's own work instead of forwarding the call, this is solved. I would guess this applies to other methods of Stream as well, rendering it a little bit dangerous to use - compare http://stackoverflow.com/questions/4132924/functional-processing-of-scala-streams-without-outofmemory-errors/4134541SI-4134541 this comment . 8-}
        Hide
        Philipp Haller added a comment -

        (In r24412) Closes SI-2239.

        Show
        Philipp Haller added a comment - (In r24412) Closes SI-2239 .

          People

          • Assignee:
            Philipp Haller
            Reporter:
            Sean Parsons
            TracCC:
            Hans-Peter Strr, Seth Tisue
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development