Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large mapped streams cause OutOfMemoryError when using reduceLeft. #2239

Closed
scabug opened this issue Aug 8, 2009 · 5 comments
Closed

Large mapped streams cause OutOfMemoryError when using reduceLeft. #2239

scabug opened this issue Aug 8, 2009 · 5 comments
Assignees

Comments

@scabug
Copy link

scabug commented Aug 8, 2009

On my machine this runs for about 40 seconds and then the error occurs. I've tried this on both the nightly 2.8 build, 2.7.3final (Ubuntu) and whatever Netbeans is bringing along with it. There doesn't seem to be any reason for this that has become apparent to me from looking at the Scala code underlying the related methods.

def nextNumber(number: Long): Long =
{
    if (number % 2 == 0)
    {
        return number / 2;
    }
    else
    {
        return (number * 3) + 1;
    }
}

def numberStream(number: Long): Stream[Long] =
{
    if (number == 1)
    {
        return Stream.cons(1, Stream.empty);
    }
    return Stream.cons(number, numberStream(nextNumber(number)));
}

def numberRangeStream(number: Long): Stream[Long] =
{
    if (number == 999999L)
    {
        return Stream.cons(999999L, Stream.empty);
    }
    return Stream.cons(number, numberRangeStream(number + 2));
}

val maxValue = numberRangeStream(500001).map(numberStream(_)).reduceLeft((left, right) => (
 if (left.length > right.length) left else right
 )).first;
println("Largest value = " + maxValue);
@scabug
Copy link
Author

scabug commented Aug 8, 2009

Imported From: https://issues.scala-lang.org/browse/SI-2239?orig=1
Reporter: Sean Parsons (seanparsons)

@scabug
Copy link
Author

scabug commented Aug 24, 2009

@phaller said:
Note that already the following program throws an OutOfMemoryError if the assignment mapped = null is commented out as shown:

object Test extends Application {

  def nextNumber(number: Long): Long =
    if (number % 2 == 0)
      number / 2
    else
      (number * 3) + 1

  def numberStream(number: Long): Stream[Long] =
    if (number == 1)
      Stream.cons(1, Stream.empty)
    else
      Stream.cons(number, numberStream(nextNumber(number)))

  def numberRangeStream(high: Long, number: Long): Stream[Long] =
    if (number == high)
      Stream.cons(high, Stream.empty)
    else
      Stream.cons(number, numberRangeStream(high, number + 2))

  val low = 500001
  val high = 999999L
  var mapped = numberRangeStream(high, low).map(numberStream(_))
  val s = mapped.size
  println("size of mapped stream: "+s)

  var i = 0
  var curr = mapped
  //mapped = null
  var el = curr.head
  while (i < s) {
    i += 1
    el = curr.head
    print(el.size+" ")
    curr = curr.tail
  }
}

Therefore, no matter how reduceLeft is implemented (and currently it is implemented iteratively), an OutOfMemoryError is thrown, since the reference to the head of the stream is enough to retain all the contained streams. This means the reduction has to be implemented in a custom way without retaining references to sub-streams that have already been processed.

Therefore, I am closing this ticket, since it has apparently nothing to do with reduceLeft, or any other method of Stream used in the example.

@scabug
Copy link
Author

scabug commented Aug 26, 2009

@SethTisue said:
what about this?

Stream.const(1).reduceLeft(_ max _)

it seems to me that it should run in constant space, but it runs out of memory on both 2.7 and trunk (r18571). if #692 is fixable, isn't this fixable too?

@scabug
Copy link
Author

scabug commented Nov 10, 2010

Hans-Peter Strr (hans-peter.stoerr) said:
I ran probably into the same problem. The simplest way to reproduce it are these two lines:

(1 to 10000000).toStream.foldLeft(0)(_+_)
(1 to 10000000).toStream.reduceLeft(_+_)

Both are semantically equivalent. The first one runs correctly, and the second one throws an OutOfMemoryError.

The reason for this is that reduceLeft calls foldLeft for the actual work. foldLeft it itself allows the garbage collection of the already processed part of the stream because it discards the reference to the original head of the stream by using tailrecursion. On the other hand, the call to reduceLeft is forwarded to foldLeft, and thus the stackframe of reduceLeft holds a reference to the original head of the stream during the full time of the call, such that nothing can be garbage collected.

I suppose the only way to fix this is some code duplication: if reduceLeft does it's own work instead of forwarding the call, this is solved.

I would guess this applies to other methods of Stream as well, rendering it a little bit dangerous to use - compare [http://stackoverflow.com/questions/4132924/functional-processing-of-scala-streams-without-outofmemory-errors/4134541SI-4134541 this comment]. 8-}

@scabug
Copy link
Author

scabug commented Mar 8, 2011

@phaller said:
(In r24412) Closes #2239.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants