Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: Scala 2.9.2
    • Fix Version/s: Backlog
    • Component/s: Parser Combinators
    • Labels:
      None
    • Environment:

      Scala code runner version 2.9.0.1 – Copyright 2002-2011, LAMP/EPFL

      java version "1.6.0_26"
      Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11A511)
      Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)

      Description

      JSON.parseFull or parseRaw randomly fails with NPE. In order to get the stacktrace, one must use -Xint flag. It works fine most of the time and randomly fails. When I run a simple script that parses JSON.parseFull("

      {\"hello\": \"dude\"}

      "), when run in a loop 10K times, it fails a few times during the run (again, randomly). Below is the stacktrace...

      java.lang.NullPointerException: null
      at scala.util.parsing.combinator.Parsers$NoSuccess.<init>(Parsers.scala:132) ~[scala-library.jar:na]
      at scala.util.parsing.combinator.Parsers$Failure.<init>(Parsers.scala:159) ~[scala-library.jar:na]
      at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:499) ~[scala-library.jar:na]
      ...

        Activity

        Hide
        Jens Halm added a comment - - edited

        Thanks for providing the example. I agree that rules out no. 2 of my suggestions.

        As for putting it inside Success, I'm afraid that won't work (or I don't understand the idea). It's a case class used in pattern matches everywhere so we cannot change the API. And in the hundreds of places where new Success instances get created, how would we set the value there, where would we get it from?

        I feel the ideal solution (but that would require API changes, too), would be to route something through the parsing operations (e.g. piggy-backing on Reader or (semantically cleaner) in a separate, custom state object. Similarly like the parsec library for Haskell allows to route user state through the parsing process. But I don't know the considerations behind the original design that led to user state being left out.

        Regarding my idea 1), what are your concerns, apart from not being 100% perfect? It would reduce the leak from a ThreadLocal potentially holding the full input string, to a ThreadLocal holding a None, so a huge improvement.

        And yes, always using the phrase parser as an entry point avoids the issue. I do that in Laika, too, so I'm not really affected. I'm just volunteering as I noticed this ancient issue causes some "bad press" for the combinators.

        Show
        Jens Halm added a comment - - edited Thanks for providing the example. I agree that rules out no. 2 of my suggestions. As for putting it inside Success, I'm afraid that won't work (or I don't understand the idea). It's a case class used in pattern matches everywhere so we cannot change the API. And in the hundreds of places where new Success instances get created, how would we set the value there, where would we get it from? I feel the ideal solution (but that would require API changes, too), would be to route something through the parsing operations (e.g. piggy-backing on Reader or (semantically cleaner) in a separate, custom state object. Similarly like the parsec library for Haskell allows to route user state through the parsing process. But I don't know the considerations behind the original design that led to user state being left out. Regarding my idea 1), what are your concerns, apart from not being 100% perfect? It would reduce the leak from a ThreadLocal potentially holding the full input string, to a ThreadLocal holding a None, so a huge improvement. And yes, always using the phrase parser as an entry point avoids the issue. I do that in Laika, too, so I'm not really affected. I'm just volunteering as I noticed this ancient issue causes some "bad press" for the combinators.
        Hide
        Seth Tisue added a comment -

        Jens: I think you're right — my idea (making lastNoSuccess part of Success) wouldn't work without sweeping changes.

        I have nothing specific against the ThreadLocal thing other than general fear and distrust of ThreadLocal. I think you should go ahead.

        Show
        Seth Tisue added a comment - Jens: I think you're right — my idea (making lastNoSuccess part of Success) wouldn't work without sweeping changes. I have nothing specific against the ThreadLocal thing other than general fear and distrust of ThreadLocal. I think you should go ahead.
        Hide
        Sarah Gerweck added a comment -

        I have to say I'm very glad that the "we don't need thread safety" proposal seems to be off the table. I've built a few parsers using the combinator library, and in my tests creating a parser is a couple orders of magnitude more expensive than running a parser, making it far too expensive in the use cases I've shipped. An object pool is workable but it's kind of embarrassing to have to do that in a functional language.

        I also think it would be a big mistake to get rid of the proper phrase support that noLastSuccess enables. Seth's example is a good one: no user would think that "end of input expected" meant that you forgot to include a second argument. That would be a classic example of an error message that gives no guidance on what's really wrong or how to fix it—and the parsing library would offer no way to improve the message.

        As far as I can tell Jens's proposal would be good enough to enable any real-world use case. It's a nice bonus that you wouldn't have to move away from DynamicVariable since there's no risk of GC or data leaks when the container object is explicitly scoped like that.

        I think any kind of thread-local storage means you should put a notice in the documentation that things won't work properly unless all the parsing happens on the same thread. One of the benefits of combinatorial parsing in Scala is that you can interact with your code & data structures from inside your parser. This can make it easy to start mixing in actors or parallel collections in certain situations. An inheritable thread-local variable won't be retained if you're dealing with actors or worker pools.

        I don't think that it's at all unreasonable to say that all the parsing has to happen on the same thread. As long as people know it's a restriction, I don't think they'll have any trouble respecting it. If they don't know, somebody is going to be back here filing another bug.

        Show
        Sarah Gerweck added a comment - I have to say I'm very glad that the "we don't need thread safety" proposal seems to be off the table. I've built a few parsers using the combinator library, and in my tests creating a parser is a couple orders of magnitude more expensive than running a parser, making it far too expensive in the use cases I've shipped. An object pool is workable but it's kind of embarrassing to have to do that in a functional language. I also think it would be a big mistake to get rid of the proper phrase support that noLastSuccess enables. Seth's example is a good one: no user would think that "end of input expected" meant that you forgot to include a second argument. That would be a classic example of an error message that gives no guidance on what's really wrong or how to fix it—and the parsing library would offer no way to improve the message. As far as I can tell Jens's proposal would be good enough to enable any real-world use case. It's a nice bonus that you wouldn't have to move away from DynamicVariable since there's no risk of GC or data leaks when the container object is explicitly scoped like that. I think any kind of thread-local storage means you should put a notice in the documentation that things won't work properly unless all the parsing happens on the same thread. One of the benefits of combinatorial parsing in Scala is that you can interact with your code & data structures from inside your parser. This can make it easy to start mixing in actors or parallel collections in certain situations. An inheritable thread-local variable won't be retained if you're dealing with actors or worker pools. I don't think that it's at all unreasonable to say that all the parsing has to happen on the same thread. As long as people know it's a restriction, I don't think they'll have any trouble respecting it. If they don't know, somebody is going to be back here filing another bug.
        Hide
        Adriaan Moors added a comment -

        Unassigning and rescheduling to M6 as previous deadline was missed.

        Show
        Adriaan Moors added a comment - Unassigning and rescheduling to M6 as previous deadline was missed.
        Hide
        Adriaan Moors added a comment - - edited

        FYI, we've made it much easier to contribute to the parser combinator library.
        https://github.com/scala/scala-parser-combinators eagerly awaits your pull requests – it has a standard sbt build, test suite, travis CI

        Show
        Adriaan Moors added a comment - - edited FYI, we've made it much easier to contribute to the parser combinator library. https://github.com/scala/scala-parser-combinators eagerly awaits your pull requests – it has a standard sbt build, test suite, travis CI

          People

          • Assignee:
            Unassigned
            Reporter:
            Ilya Sterin
          • Votes:
            14 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

            • Created:
              Updated:

              Development