Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scala.xml.pull.XMLEventReader not keeping up with InputStream #4599

Closed
scabug opened this issue May 17, 2011 · 19 comments
Closed

scala.xml.pull.XMLEventReader not keeping up with InputStream #4599

scabug opened this issue May 17, 2011 · 19 comments

Comments

@scabug
Copy link

scabug commented May 17, 2011

=== What steps will reproduce the problem (please be specific and use wikiformatting)? ===

The following code is used to read XMPP events from talk.google.com

import java.net.Socket
import java.io._

import scala.xml.pull._
import scala.io.Source

val HOST = "talk.google.com"
val PORT = 5222

var sock = new Socket(HOST, PORT)

val out = new PrintWriter(
    new BufferedOutputStream(sock.getOutputStream()), true)

val in =
    new XMLEventReader(Source.fromInputStream(
        new BufferedInputStream(sock.getInputStream()), "utf-8")).buffered

out.println("<?xml version='1.0'?>")
out.println("<stream:stream xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' to='chilon.net' version='1.0' xml:lang='en'>")

while (in.hasNext) System.out.println(in.next)

=== What is the expected behavior? ===

I should receive all of the XML items from the input stream.

=== What do you see instead? ===

I miss the final </stream:features> tag. I verified with tcpdump that this is present. If I send further data (to trigger more data from the server) then eventually I will see the closing tag.. but then I will lack tags at the end again.

=== Additional information ===

Sometimes more than the last tag is missing. I've tried not using ".buffered" and without Buffered{Input,Output}Stream too. All the same results.

=== What versions of the following are you using? ===

  • Scala: 2.8.0, 2.8.1, 2.9.0
  • Java: OpenJDK 1.6.0_22
  • Operating system: Linux 2.6.37
@scabug
Copy link
Author

scabug commented May 17, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4599?orig=1
Reporter: J (tumeow)

@scabug
Copy link
Author

scabug commented May 17, 2011

J (tumeow) said:
Note the code there is a full scala script that can be run on your end to test.

@scabug
Copy link
Author

scabug commented May 17, 2011

@dcsobral said:
The problem is possibly related to the fact that stream:stream, the first tag of all, is never closed.

@scabug
Copy link
Author

scabug commented May 17, 2011

J (tumeow) said:
Replying to [comment:3 dcsobral]:

The problem is possibly related to the fact that stream:stream, the first tag of all, is never closed.

I very much doubt that. This is the way XML streams are.

XML does not permit multiple root nodes.

Even if it was permitted.. XMLEventReader is capable of detecting other nodes it comes across that have opened and not closed, and without the ability to do it at the root level it makes this class useless for XML streams.

@scabug
Copy link
Author

scabug commented May 17, 2011

J (tumeow) said:
My previous comment was slightly ambiguous.. the XML nodes at the end it misses out are closed, but the parent node not being closed shouldn't affect detection of that.

@scabug
Copy link
Author

scabug commented May 17, 2011

huynhjl said:
Here is more condensed test case:

val src = new io.Source {
  val iter = "<stream><features><foo/></features>".iterator ++
    Iterator.continually{ println("wait 5s"); Thread.sleep(5000); '\n' }.take(1) ++
    "</stream>".iterator
}
val reader = new xml.pull.XMLEventReader(src)
while (reader.hasNext) System.out.println(reader.next)

One would hope that EvElemEnd(null,features) is printed before waiting 5 second. Instead we get:

EvElemStart(null,stream,,)
EvElemStart(null,features,,)
EvElemStart(null,foo,,)
EvElemEnd(null,foo)
wait 5s
EvElemEnd(null,features)
EvText(
)
EvElemEnd(null,stream)

Not sure if it is something inherent to the XML parser.

@scabug
Copy link
Author

scabug commented May 17, 2011

J (tumeow) said:
With my XMPP server I am unable to do this:

  • Send request to server
  • Wait for response (final stream in node)
  • Send repsonse based on entire final XML tag in response.

It will block forever before I can determine the final tag in the stream is closed... at best I can't make my second request knowing I've got the full information to respond to.

@scabug
Copy link
Author

scabug commented May 17, 2011

@dcsobral said:
My current theory is that the following is happening. When reading the last closing tag sent, this method is called:

  def xEndTag(startName: String) {
    xToken('/')
    if (xName != startName)
      errorNoEnd(startName)

    xSpaceOpt
    xToken('>')
  }

Since that tag is properly sent, it runs up to xToken('>'), which is defined thus:

  def xToken(that: Char) {
    if (ch == that) nextch
    else xHandleError(that, "'%s' expected instead of '%s'".format(that, ch))
  }

Evidently, it calls nextch, which does the obvious thing:

  def nextch = {
    if (curInput.hasNext) {
      ch = curInput.next
      pos = curInput.pos
    } else {

The point here is that it needs to know if there is a next or not. Since this is a stream, it blocks until the stream is closed or something else is sent.

Unfortunately, the whole parser is based on having ch available (it is a var), so it doesn't look like there's an easy fix for it.

@scabug
Copy link
Author

scabug commented May 17, 2011

@dcsobral said:
Having said that (which was actually a while ago, but it wasn't accepted because there was an answer already), I think it can be fixed. I'm presently compiling a patch that ought to do the trick.

My idea is moving nextch's logic into ch and creating a flag that ch checks to see if it executes nextch's logic or return the last read character. It also adds a var to keep that last read character.

Then, nextch is modified to call ch (in case it is called twice in a row), and then setting the flag to true. Finally, it is modified to return Unit instead of Char, and all (of two) places where its return value is used are changed to call ch afterwards.

Unfortunately, nextch and "var ch" are both public, so this means a change in their API, however unlikely it is for anyone to be using them directly.

@scabug
Copy link
Author

scabug commented May 17, 2011

J (tumeow) said:
Replying to [comment:9 dcsobral]:

Unfortunately, nextch and "var ch" are both public, so this means a change in their API, however unlikely it is for anyone to be using them directly.

Does this mean it will take a while to hit the repository? I don't mind using the bleeding edge version of scala but it it won't even drop to the repository for a while perhaps I should abandon this class so I can continue my project.

@scabug
Copy link
Author

scabug commented May 17, 2011

@dcsobral said:
I have now submitted a possible fix on[https://github.com/scala/scala/pull/18 github].

It does fix the issue reported, and it doesn't seem to break anything else. However, Scala's test are a bit low on XML tests.

@scabug
Copy link
Author

scabug commented May 20, 2011

J (tumeow) said (edited on May 20, 2011 3:34:04 PM UTC):
Not sure if that patch works.. checking the 2.9.0 codebase I can see the freeze occurs here around line 543 in MarkupParser.scala:

    xToken('>')
    handle.elemStart(pos, pre, local, aMap, scope)
    val tmp = content(scope)
    xEndTag(qname) // freezes here and doesn't get to tmp
    tmp

Fixing xEndTag here alone would be enough to at least get the most usual cases fixed.

@scabug
Copy link
Author

scabug commented May 20, 2011

@dcsobral said:
But that's precisely what the patch does: stops xEndTag from freezing -- you are confusing the xToken('>') at MarkupParser with the one at the end of xEndTag.

@scabug
Copy link
Author

scabug commented May 20, 2011

J (tumeow) said:
The xToken('>') within xEndTag(qname) is where it freezes yes I know.. not confused.

@scabug
Copy link
Author

scabug commented May 20, 2011

@dcsobral said:
Well, the patch makes it not freeze there, so why do you think it doesn't work?

@scabug
Copy link
Author

scabug commented May 20, 2011

J (tumeow) said:
The patch was missing a file I see you've added it now.

@scabug
Copy link
Author

scabug commented May 21, 2011

J (tumeow) said:
Compiled scala with your patch applied (from the github scala clone master branch). Now attempting my test gives this:

:1:244: '/' expected instead of '' ^
:1:244: name expected, but char '' cannot start a name ^
Exception in thread "XMLEventReader" scala.xml.parsing.FatalError: expected closing tag of stream:stream
at scala.xml.parsing.MarkupParser$class.errorNoEnd(MarkupParser.scala:41)
at scala.xml.pull.XMLEventReader$Parser.errorNoEnd(XMLEventReader.scala:56)
at scala.xml.parsing.MarkupParserCommon$class.xEndTag(MarkupParserCommon.scala:93)
at scala.xml.pull.XMLEventReader$Parser.xEndTag(XMLEventReader.scala:56)
at scala.xml.parsing.MarkupParser$class.element1(MarkupParser.scala:555)
at scala.xml.pull.XMLEventReader$Parser.element1(XMLEventReader.scala:56)
at scala.xml.parsing.MarkupParser$class.content1(MarkupParser.scala:408)
at scala.xml.pull.XMLEventReader$Parser.content1(XMLEventReader.scala:56)
at scala.xml.parsing.MarkupParser$class.content(MarkupParser.scala:429)
at scala.xml.pull.XMLEventReader$Parser.content(XMLEventReader.scala:56)
at scala.xml.parsing.MarkupParser$class.document(MarkupParser.scala:234)
at scala.xml.pull.XMLEventReader$Parser.document(XMLEventReader.scala:56)
at scala.xml.pull.XMLEventReader$Parser$$anonfun$run$1.apply(XMLEventReader.scala:90)
at scala.xml.pull.XMLEventReader$Parser$$anonfun$run$1.apply(XMLEventReader.scala:90)
at scala.xml.pull.ProducerConsumerIterator$class.interruptibly(XMLEventReader.scala:113)
at scala.xml.pull.XMLEventReader.interruptibly(XMLEventReader.scala:26)
at scala.xml.pull.XMLEventReader$Parser.run(XMLEventReader.scala:90)
at java.lang.Thread.run(Thread.java:679)

It seems it's matching the end of the stream (but not eof) as '' or the empty string.

Alas of course it cannot find a closing tag to stream:stream as.. well it is a stream.

@scabug
Copy link
Author

scabug commented May 21, 2011

J (tumeow) said:
Sorry ignore last comment. Error was due to sequencing change now full nodes are retrieved. My code works great now using the github master branch of scala.

@scabug
Copy link
Author

scabug commented May 25, 2011

Commit Message Bot (anonymous) said:
(extempore in r24998) Fix #4599: XMLEventReader issue with input stream.

Makes MarkupParser.nextch lazy, only reaching out for the next char
when calling ch or eof. To make it possible, nextch now returns Unit
instead of Char. As it happens, that's how it is used almost everywhere.

Contributed by Daniel Sobral, no review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant