Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Source.getLines broken in 2.9.0.1 #4662

Closed
scabug opened this issue May 31, 2011 · 11 comments
Closed

Regression: Source.getLines broken in 2.9.0.1 #4662

scabug opened this issue May 31, 2011 · 11 comments
Assignees

Comments

@scabug
Copy link

scabug commented May 31, 2011

Source.getLines appears to be... entirely broken. I have a very simple test case: a plain-jain text file of three lines save in normal utf-8. Calling .getLines on the Source from this File does nothing: an empty iterator, no lines at all. However, it has no problem iterating over the chars.

This is a regression, it worked without a problem in 2.8. This paste shows exactly the problem and compares with 2.8:

http://pastie.org/2000844

@scabug
Copy link
Author

scabug commented May 31, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4662?orig=1
Reporter: Paul Lambert (paulitex)
Affected Versions: 2.9.0

@scabug
Copy link
Author

scabug commented Jun 1, 2011

@dcsobral said:
I suspect I know what is happening. Take a look at this code inside BufferedSource:

  def reader() = new InputStreamReader(inputStream, codec.decoder)
  def bufferedReader() = new BufferedReader(reader(), bufferSize)
  
  override val iter = {
    val reader = bufferedReader()    
    Iterator continually (codec wrap reader.read()) takeWhile (_ != -1) map (_.toChar)
  }

  class BufferedLineIterator extends Iterator[String] {
    val bufReader = BufferedSource.this.bufferedReader()

The definition reader creates a new InputStreamReader out of inputStream, which is then used to create a BufferedReader by bufferedReader. The latter method is called inside iter to create an iterator, and it is also called by BufferedLineIterator. However, I'm guessing inputStream has already been consumed by either InputStreamReader or BufferedReader constructor.

@scabug
Copy link
Author

scabug commented Jun 1, 2011

@dcsobral said:
Nope, I'm wrong. It is Iterator continually (codec wrap reader.read()) takeWhile (_ != -1) map (_.toChar) which consumes the input stream.

@scabug
Copy link
Author

scabug commented Jun 1, 2011

@dcsobral said:
The problem happens at takeWhile. It's hasNext calls next on the iterator it is based on, therefore changing it (ie, calling reader.read). Perhaps it would be best for takeWhile not to do so, but the main problem here is that calling getLines will only work if the previous iterator had not been used at all.

I think the proper solution would be for val reader/val bufReader to be moved up one level in scope and shared between iter and BufferedLineIterator.

On the other hand, it turns out this bug isn't all that bad. It only happens if you do something to the iterator before using it -- such as REPL showing whether the iterator is empty or not.

@scabug
Copy link
Author

scabug commented Jun 1, 2011

@dcsobral said:
Moving reader/bufReader won't work. A hasNext will still call next on it, making the first character, if nothing else, unavailable to getLines. So maybe a reset is needed, or fixing takeWhile not to be destructive on hasNext after all, even though it has worked that way all the way since r19900.

@scabug
Copy link
Author

scabug commented Jun 1, 2011

Dmitry Antonyuk (lomeo) said (edited on Jun 4, 2011 11:28:43 AM UTC):
BTW try

io.Source.stdin.getLines.foreach(println)

It shows first line only after we entered second line.

AFAIK it happens 'cause BufferedSource.BufferedLineIterator has

var nextLine = bufReader.readLine

So when we create BufferedLineIterator by calling getLines we also wait for a user input.

I guess the correct code is something like this:

class BufferedLineIterator extends Iterator[String] {
  val bufReader = BufferedSource.this.bufferedReader()
  var nextLine = null
  var unread = true

  override def hasNext() = {
    if (unread) {
      nextLine = bufReader.readLine
      unread = false
    }
    nextLine != null
  }

  override def next(): String = {
    val result = nextLine
    if (unread) {
      nextLine = bufReader.readLine
    } else {
      unread = true
    }
    result
  }
}

Should I create a new bug for this or these bugs are same?

@scabug
Copy link
Author

scabug commented Jun 4, 2011

@dcsobral said (edited on Jun 4, 2011 1:39:22 PM UTC):
That's a different bug, really, but I'm not in any position to tell you whether to open a new bug or not.

The solution you provided is still suboptimal, however, because whenever you call "next", it will only return after the the following line has read. In other words, you fixed it for first/second line, but it will still happen with second/third, third/fourth, etc. For an example of the sort of problem this can give, see #4599.

I think the following would fix it for next:

  override def next(): String = {
    if (unread) {
      nextLine = bufReader.readLine
    }
    unread = true
    nextLine
  }

@scabug
Copy link
Author

scabug commented Jul 1, 2011

Commit Message Bot (anonymous) said:
(extempore in r25212) Keep BufferedSource from losing buffered data when a derivative iterator
is created via getLines. Closes #4671, #4662, no review.

@scabug scabug closed this as completed Jul 1, 2011
@scabug
Copy link
Author

scabug commented Oct 17, 2011

Viktor Hedefalk (hedefalk) said:
I'm still having problems with this on 2.9.2r25743-b20110928032420 from within the scala-ide:

@RunWith(classOf[JUnitRunner])
class SourceTest extends FlatSpec with ShouldMatchers with Logger {
  import scala.io.Source._
  it should "not frikkin eat it up" in {
    val source = fromInputStream(this.getClass.getResourceAsStream("fileWithTwoLines"))(Codec.UTF8)
    debug("source: " + source)
    val lines = source.getLines()
    debug("lines.hasNext: " + lines.hasNext)
    debug("lines size: " + lines.size)
    debug("lines: " + lines)
    debug("lines.hasNext: " + lines.hasNext)
    for(line <- lines) yield (debug("line: " + line))
  }
}

yields the following log:

16:40:54.576 [main] DEBUG test.SourceTest - source: non-empty iterator
16:40:54.583 [main] DEBUG test.SourceTest - lines.hasNext: true
16:40:54.584 [main] DEBUG test.SourceTest - lines size: 2
16:40:54.585 [main] DEBUG test.SourceTest - lines: empty iterator
16:40:54.585 [main] DEBUG test.SourceTest - lines.hasNext: false

@scabug
Copy link
Author

scabug commented Oct 17, 2011

@paulp said:
You can't call "size" on an Iterator without exhausting the iterator.

@scabug
Copy link
Author

scabug commented Oct 17, 2011

Viktor Hedefalk (hedefalk) said:
Hehe, thanks. I must be tired. I blame the missing parens :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants