indexOfSlice() hangs when working on a largish stream #9830

scabug · 2016-06-25T11:00:48Z

This takes a few secs but works

val source = scala.io.Source.fromChars(("x" * 6000000).toArray)
source.toSeq.indexOfSlice("tteesstt")

modify the 6000000 to 7000000 and it hangs, eating CPU (though not memory).

Seems that it's the indexOfSlice that's failing.

scabug · 2016-06-25T11:00:48Z

Imported From: https://issues.scala-lang.org/browse/SI-9830?orig=1
Reporter: ImNotTellingYouThat (intyt)
Affected Versions: 2.11.8

scabug · 2016-06-28T11:37:15Z

Jasper-M said:
Are you sure that the JVM's memory isn't full and the CPU usage you're seeing isn't just the garbage collector?

scabug · 2016-06-28T21:46:10Z

ImNotTellingYouThat (intyt) said:
Jasper Moeys: I thought it unlikely as there was no obvious memory use but I've just repeated it and bingo,

java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.StreamIterator.(Stream.scala:1104
at scala.collection.immutable.Stream.iterator(Stream.scala:578)
at scala.collection.SeqLike$class.startsWith(SeqLike.scala:304)
...

So

it locked up before, now it crashes; is this a flaw in the JVM or in scala? Bear in mind that when I reported this it was just hanging to the point I had to kill the terminal window, so when I repeated the test (hanging each time) it was starting a new JVM instance.
Why, on windows' task manager, and I not seeing any significant memory use? I have plenty to spare.
why should it run out of mem? Bear in mind I know very little about scala but with this code
source.toSeq.indexOfSlice("tteesstt")
toSeq produces a lazy structure:

scala> source.toSeq
res0: Seq[Char] = Stream(x, ?)

so the obvious question is, is indexOfSlice hanging on to the head of the stream as it works its way along, because how else is memory being retained? What should be happening do you think? (I'm asking because I genuinely don't know).

cheers

jan

scabug · 2016-07-03T18:44:33Z

ImNotTellingYouThat (intyt) said:
Further, I've just repeated this to look at memory use - and this time it hung (and has hung for several minutes at 100% cpu) rather than excepted.
When I started scala in the JVM, the JVM was at ~240 meg. With it hanging it's at 360 meg. This trivial amount of memory is well within what a 32-bit JVM should be able to handle, never mind a 64-bit one. I've actually got ~10 gig of memory free.

scabug · 2016-08-10T22:58:24Z

@SethTisue said:
Thank you for the report!

This doesn't have anything to do with indexOfSlice in particular; substituting e.g. last shows the same behavior.

The underlying issue here is that at runtime, calling .toSeq on Iterator returns a scala.collection.immutable.Stream, which is a very expensive data structure (both in space and time) — though the fact that its tail is lazy means that you don't pay the cost until you actually traverse it.

In general, toSeq is something of a trap in that as a type, Seq gives you almost no guarantees. The Seq you get back might be strict or lazy, compact or memory hungry, finite or infinite, stack overflow prone or stack safe, etc etc etc. For small collections it often doesn't matter but as soon as you're slinging big amounts of data around you probably want to be working with concrete collection types so you know what you're getting. Substitute toVector for toSeq here, and it will run pretty fast.

So, I've responded here on JIRA, but not to all of your questions. They are good questions, but I suggest asking them on scala-user, the scala/scala Gitter channel, or Stack Overflow. (If you have followup questions about what I've said here, same recommendation about where to take the discussion.)

scabug closed this as completed Aug 10, 2016

scabug mentioned this issue Apr 7, 2017

Future hangs when it probably shouldn't #9831

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexOfSlice() hangs when working on a largish stream #9830

indexOfSlice() hangs when working on a largish stream #9830

scabug commented Jun 25, 2016

scabug commented Jun 25, 2016

scabug commented Jun 28, 2016

scabug commented Jun 28, 2016

scabug commented Jul 3, 2016

scabug commented Aug 10, 2016

indexOfSlice() hangs when working on a largish stream #9830

indexOfSlice() hangs when working on a largish stream #9830

Comments

scabug commented Jun 25, 2016

scabug commented Jun 25, 2016

scabug commented Jun 28, 2016

scabug commented Jun 28, 2016

scabug commented Jul 3, 2016

scabug commented Aug 10, 2016