[SI-4929] Parser combinators are not thread-safe Created: 19/Aug/11  Updated: 18/Jul/15  Resolved: 18/Jul/15

Status: CLOSED
Project: Scala Programming Language
Component/s: Parser Combinators (NO NEW ISSUES)
Affects Version/s: Scala 2.9.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Ilya Sterin Assignee: Unassigned
Resolution: Out of Scope Votes: 15
Labels: None

Scala code runner version – Copyright 2002-2011, LAMP/EPFL

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11A511)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)

Attachments: Text File scala_nullpointer.txt    
Issue Links:
is mentioned by SI-9010 Parser Combinators and DynamicVariabl... CLOSED


JSON.parseFull or parseRaw randomly fails with NPE. In order to get the stacktrace, one must use -Xint flag. It works fine most of the time and randomly fails. When I run a simple script that parses JSON.parseFull("

{\"hello\": \"dude\"}

"), when run in a loop 10K times, it fails a few times during the run (again, randomly). Below is the stacktrace...

java.lang.NullPointerException: null
at scala.util.parsing.combinator.Parsers$NoSuccess.<init>(Parsers.scala:132) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$Failure.<init>(Parsers.scala:159) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:499) ~[scala-library.jar:na]

Comment by Brian Maso [ 07/Oct/11 ]

I just ran in to this same NPE, at the same line in Parsers.NoSuccess. No that the NPE occurred when processing tens of thousands of JSON docs (Twitter API tweets, actually). The NPE seems to have occurred on a line of JSON input document that is well formed. The JSON parser is the Dispatch JSON parser, version 0.8.5.

The tack trace I received is the same as the original reporter'sfor the last few stack frames:

java.lang.NullPointerException: null
at scala.util.parsing.combinator.Parsers$NoSuccess.<init>(Parsers.scala:132) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$Failure.<init>(Parsers.scala:159) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:499) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:497) ~[scala-library.jar:na]

The offending line is 132. AFAICT, The only way an NPE would happen at this point is if the "next", "lastNoSuccess", or "lastNoSuccess.next" members are null. "lastNoSuccess" is a var in the outer class, so while the not-null check at line 132 would seem to guarantee "lastNoSuccess" is not null, the highly infrequent and random occurrence of the NPE points to a race condition – perhaps the "lastNoSuccess" member is somehow being changed to null after the null check, but before the "< lastNoPosition.next" check?

Comment by Jason Zaugg [ 08/Oct/11 ]

The stdlib Parser combinators aren't threadsafe, and hence neither is JSON.parseRaw

You should construct a new scala.util.parsing.json.Parser for each request, and use a pool of them if performance counts.

This should be better documented in both the parser and JSON packages.


Comment by Alex Cruise [ 23/Nov/11 ]

Can we either change the title of this bug, or file a new one, about the lack of thread safety?

Comment by Daniel Sobral [ 08/May/12 ]

I'm troubled by the fix to this bug. It makes phrase and NoSuccess slower, by introducing a lazy val, and DynamicVariable _and_ an Option. Now, phrase shouldn't be much of a problem, but normal parsing process will create tons of NoSuccess, even if the parser succeeds (one for every option that was considered and failed).

Now, I'm all for correctness – it's easy to make something fast if it doesn't have to work. However, I can't think of any reason why parser should be shared between multiple threads. Why not simply instantiate it on demand? JSON itself could be turned into a class, and the object turned into a factory.

Here's the pull request.

Comment by Paul Phillips [ 08/May/12 ]

Well, I sat on this for months because I wasn't real excited about it, and finally I asked adriaan to make a determination and he said go with it, although it wasn't the most ringing endorsement either. But mostly, neither of us can afford to engage with peripheral library issues right now. If you want to send another pull request that'd be great.

Comment by Daniel Sobral [ 08/May/12 ]

Sure. Close the ticket and I'll look into possible performance issues. Happily, I have the benchmark setup I wrote for Range.foreach.

Comment by Stephen Judkins [ 08/May/12 ]

Thanks for looking into possible performance implications. A few points:

  • I did some benchmarking (of various JSON strings) and did not, to my recollection, see a major drop in performance. The cost of creating NoSuccess instances was small relative to everything else the parser does.
  • In our case (IRC protocol parser) constructing the parser instances was more expensive than the parsing itself. This is obviously a function of the very small size of messages, and relatively complexity of parser, but just creating a new non-threadsafe parser for every parse job would not be an acceptable solution for many people.
  • I know this solution is fugly. I played around with other solutions that don't use mutable state, but the only reasonable one I came up with (state monad containing lastNoSuccess) would involve breaking the existing parser API. If anyone has any superior suggestions I'd love to help implement them.
Comment by Daniel Sobral [ 08/May/12 ]

Have you tried without lazy? I don't see it gaining much.

Comment by Stephen Judkins [ 08/May/12 ]

I recall (it's been awhile) that `lazy val` is used instead of `val` to avoid initialization order problems, not as an optimization.

We should have actual benchmark data before we continue this discussion. Have you written one yet? If not, would you like me to do so?

Comment by Daniel Sobral [ 09/May/12 ]

I have a framework for the benchmark. http://github.com/dcsobral/scala-foreach-benchmark. What is nice about it is that you can put different versions of the compiler in a directory, and it will run the tests through caliper for each compiler version.

I haven't written the benchmark for this yet – I won't have time for that before Friday, and I didn't have any particular regex parser in mind.

However, I just recalled Scalate is based on Scala parsers, and I think they do have some benchmarks. If so, it would be a nice, non-trivial, real-world test case.

Comment by jsh [ 20/Jul/12 ]

Wondering when it will be resolved? Another issue is deep recursion inherent in the code, which affects performance (see stack trace):

The exception was noticed in

Comment by Daniel Sobral [ 20/Jul/12 ]

@jsh A solution has been provided. The only thing's missing is benchmark to see if there's no serious performance impact.

Comment by Seth Tisue [ 04/Jan/13 ]

the "cure" here was worse than the disease, at least for me. https://github.com/scala/scala/commit/dce6b34c38a6d774961ca6f9fd50b11300ecddd6 broke my app under 2.10. the commit introduces a ThreadLocal, without providing any way of calling `remove()` on it to clean up. so my no-longer-used Thread isn't eligible for garbage collection anymore like it was on 2.9.

I was able to work around it as follows by running this code when I'm done parsing:

val field = getClass.getDeclaredField("scala$util$parsing$combinator$Parsers$$lastNoSuccessVar")
val field2 = classOf[scala.util.DynamicVariable[_]].getDeclaredField("tl")
field.set(this, null)

but, yuck.

the good news is the lastNoSuccess thing is deprecated in 2.10, so it can be removed in 2.11.

Comment by Stephen Judkins [ 04/Jan/13 ]

Agreed: yuck. Given this report, I'm inclined to agree with skeptics that this shouldn't have been merged in.

I'm confused, though. From http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html:
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).

This implies that a non-alive thread can be garbage-collected even if a `remove` hasn't been called. If ThreadLocal follows this contract, this implies that the NoSuccess instance contains a reference.

Further, I can't reproduce this: watching this contrived example https://gist.github.com/8df7bba520c7f3a56ac3 run on 2.10 in VisualVM indicates no leakage. I suppose there might be a circular reference back to the Thread somewhere in your particular parser?

As a form of penance I'd be happy to help you track down this particular issue.

Comment by Seth Tisue [ 09/Jan/13 ]

The situation here is simpler than I had imagined.

You don't have to have multiple threads for there to be a leak. In fact, the leak potential is greatest when you have only a single thread.

The issue is that Parsers.lastNoSuccessVar retains a reference to some of the input to the parser. That is the leak. (Your input might just be some Strings, but in my app, the input was Token objects that were connected to entire graphs of other objects.)

dce6b34c didn't introduce the leak; it already existed. But it made the leak worse, as follows.

Before dce6b34c, a reference to the input was retained, but assuming the Parsers object itself eventually became eligible for GC, then the reference will get GC'ed along with it.

After dce6b34c, it is no longer enough for the Parser object to become eligible for GC for the reference to the input be dropped. Because the reference is retained through a ThreadLocal, that means it won't go away until the thread on which parsing took place gets GC'ed. So if you do parsing on a long-lived thread, you're in trouble.

Furthermore, dce6b34c doesn't just use ThreadLocal, it used InheritableThreadLocal, so the issue propagates to any child threads of the original thread, even if no parsing takes place on the child thread. (This wrinkle is what led me down a garden path about multiple threads; in my code I was doing parsing on both a parent thread and its child thread.)

Comment by Seth Tisue [ 09/Jan/13 ]

For 2.10.1, simply reverting dce6b34c is an option, but the drawback is, it would reintroduce the original thread safety issue.

An alternative would be to change the InheritableThreadLocal to a regular ThreadLocal (this would require using ThreadLocal directly, instead of scala.util.DynamicVariable) and provide an explicit cleanup method that calls remove() and/or sets lastNoSuccessVar.value to None, so I don't have to do so via reflection.

Either solution would make me and my code happy. I don't have a strong opinion about the "best" solution. But I guess I lean towards simply reverting, since ThreadLocal is notoriously error-prone and leak-prone, and since the original thread safety issue is of long standing. This ticket was never closed, so the issue was never advertised as fixed in 2.10.

Comment by Adriaan Moors [ 09/Jan/13 ]

I'm happy to give decision power to whomever comes up with a convincing pull request.

Comment by Adriaan Moors [ 28/Jan/13 ]

unfortunately, a change to `private lazy val lastNoSuccess: DynamicVariable[Option[NoSuccess]]` in `trait Parsers`
affects binary compatible, as private vals defined in traits give rise to java-public accessors in the corresponding interface

deferring to 2.11

Comment by Adriaan Moors [ 15/Mar/13 ]

Un-assigning to foster work stealing, as announced in https://groups.google.com/forum/?fromgroups=#!topic/scala-internals/o8WG4plpNkw

Comment by Jens Halm [ 26/May/13 ]

Wouldn't the fact that lastNoSuccess is no longer public in 2.11. offer new opportunities to prevent the leak? The only guy left in town actually listening to this tracking variable is the phrase parser. As I see it this would open two new possibilities:

1) Don't track the value if no-one is listening to it (and no-one is cleaning up afterwards)
2) Stop tracking altogether

Regarding 1) there would be a fairly straightforward fix. Currently each NoSuccess instance writes to the variable, blindly. So when the top level parser is not a phrase parser a NoSuccess will be written and never cleaned up afterwards. Although using phrase as the top level parser is probably the most common use case, some have run into this issue apparently. The solution would be to add a layer of indirection inside the ThreadLocal and wrap the option in an instance that would know whether it is tracking or not:

private class LastNoSuccess (tracking: Boolean) {
  private var current: Option[NoSuccess] = None
  def set (value: Option[NoSuccess]) = if (tracking) current = value
  def get = current
private val lastNoSuccess = new DynamicVariable(new LastNoSuccess(false))

And then in the phrase parser:

def apply(in: Input) = lastNoSuccess.withValue(new LastNoSuccess(true)) {

This would retain the thread-safety while mitigating the effect of the leak as it would not leave a reference to some (potentially large) input in the ThreadLocal. (Note that I haven't tested this approach, as I'd like to gather feedback first).

Regarding 2) I'm not sure how much value the tracking variable actually adds and wether it is worth all the hassle it causes, now that it is no longer public API. The only scenario in which it is usually used is when the phrase parser succeeds and has some input left which has not been consumed. If the lastNoSuccess offset was behind the position of the final parser that succeeded it is returned to the caller. I don't have a good overview of how many realistic scenarios exist where this will give the best hint for the actual cause of the failure.

Any thoughts? If someone thinks 1) or 2) would be a promising route, I'd be happy to implement and test this and do a pull request (probably after ScalaDays though).

Comment by Seth Tisue [ 27/Jun/13 ]

Jens, I'm glad you're working on this.

I agree that suggestion #1 is strictly better than the current code.

However, of course ideally we'd have no ThreadLocal at all. (The perfect is always the enemy of the good...)

Let's try and construct a case where lastNoSuccessVar makes a difference. Consider the arithmetic expression parser in Programming in Scala, at http://booksites.artima.com/programming_in_scala_2ed/examples/html/ch33.html . It uses RegexParsers.parseAll which calls Parsers.phrase. Suppose we change the definition of factor a little to give a better error:

  def factor: Parser[Any] = floatingPointNumber | "("~expr~")" | failure("expected factor")

And then we parse "0+". With the lastNoSuccessVar logic in place, we get "expected factor". Without it we just get "end of input expected". (Yup, I ran it both ways to be sure.)

So then the question is, which error is better? There are two ways of viewing the input. One is that it consists of good input (0) followed by some garbage , thus, "end of input expected". The other is that all the input is good but it is merely incomplete, thus, "expected number".

The filterNot call in Parser.phrase is there so that the choice that consumed more input is taken. And, sorry to say, I think that makes perfect sense. If you agree, then I think we need to address this without altering the behavior.

Comment by Seth Tisue [ 27/Jun/13 ]

Here's an idea. I haven't tried coding it up. What if lastNoSuccess was a val inside Success, rather than a a member of Parsers? Then every time something parses successfully, I could also ask what the last NoSuccess was along the way. It could be an ordinary field, not a ThreadLocal, and the phrase method would use it to pick the better error. Is this idea workable...?

Comment by Seth Tisue [ 28/Jun/13 ]

(As an aside, in my own code I was able to remove my clumsy reflection-based workaround code, simply by instead inserting a call to phrase(), which I wasn't previously calling. Always using phrase() as your entry point for parsing ensures proper cleanup of lastNoSuccessVar.)

Comment by Jens Halm [ 28/Jun/13 ]

Thanks for providing the example. I agree that rules out no. 2 of my suggestions.

As for putting it inside Success, I'm afraid that won't work (or I don't understand the idea). It's a case class used in pattern matches everywhere so we cannot change the API. And in the hundreds of places where new Success instances get created, how would we set the value there, where would we get it from?

I feel the ideal solution (but that would require API changes, too), would be to route something through the parsing operations (e.g. piggy-backing on Reader or (semantically cleaner) in a separate, custom state object. Similarly like the parsec library for Haskell allows to route user state through the parsing process. But I don't know the considerations behind the original design that led to user state being left out.

Regarding my idea 1), what are your concerns, apart from not being 100% perfect? It would reduce the leak from a ThreadLocal potentially holding the full input string, to a ThreadLocal holding a None, so a huge improvement.

And yes, always using the phrase parser as an entry point avoids the issue. I do that in Laika, too, so I'm not really affected. I'm just volunteering as I noticed this ancient issue causes some "bad press" for the combinators.

Comment by Seth Tisue [ 28/Jun/13 ]

Jens: I think you're right — my idea (making lastNoSuccess part of Success) wouldn't work without sweeping changes.

I have nothing specific against the ThreadLocal thing other than general fear and distrust of ThreadLocal. I think you should go ahead.

Comment by Sarah Gerweck [ 09/Jul/13 ]

I have to say I'm very glad that the "we don't need thread safety" proposal seems to be off the table. I've built a few parsers using the combinator library, and in my tests creating a parser is a couple orders of magnitude more expensive than running a parser, making it far too expensive in the use cases I've shipped. An object pool is workable but it's kind of embarrassing to have to do that in a functional language.

I also think it would be a big mistake to get rid of the proper phrase support that noLastSuccess enables. Seth's example is a good one: no user would think that "end of input expected" meant that you forgot to include a second argument. That would be a classic example of an error message that gives no guidance on what's really wrong or how to fix it—and the parsing library would offer no way to improve the message.

As far as I can tell Jens's proposal would be good enough to enable any real-world use case. It's a nice bonus that you wouldn't have to move away from DynamicVariable since there's no risk of GC or data leaks when the container object is explicitly scoped like that.

I think any kind of thread-local storage means you should put a notice in the documentation that things won't work properly unless all the parsing happens on the same thread. One of the benefits of combinatorial parsing in Scala is that you can interact with your code & data structures from inside your parser. This can make it easy to start mixing in actors or parallel collections in certain situations. An inheritable thread-local variable won't be retained if you're dealing with actors or worker pools.

I don't think that it's at all unreasonable to say that all the parsing has to happen on the same thread. As long as people know it's a restriction, I don't think they'll have any trouble respecting it. If they don't know, somebody is going to be back here filing another bug.

Comment by Adriaan Moors [ 11/Jul/13 ]

Unassigning and rescheduling to M6 as previous deadline was missed.

Comment by Adriaan Moors [ 29/Jan/14 ]

FYI, we've made it much easier to contribute to the parser combinator library.
https://github.com/scala/scala-parser-combinators eagerly awaits your pull requests – it has a standard sbt build, test suite, travis CI

Comment by David Carlton [ 25/Nov/14 ]

I think I just ran into this - I've got a heap dump with hundreds of thousands of Parsers$Failure objects, and they seem to be held through inherited thread locals. Am I correct in thinking that, if you have a mix of long-lived threads with short-lived parsers, that this creates a memory leak? If so, that seems quite bad to me, worse than the thread safety issues that this bug was originally about.

Comment by Sarah Gerweck [ 25/Nov/14 ]

I'm not sure that it's worse, because before you effectively had to make a new parser every time or force all your code onto a single thread. If you make a new parser every time, garbage collection will remove the DynamicVariable that comes along with it. If you force all your code onto a single thread, you can pretty easily use a single-threaded executor so that only one instance of ThreadLocal is ever allocated.

A cursory look at the code does suggest that if you're reusing the same parser from a lot of different threads, you can wind up with stray references. If your parsers are actually short-lived though, those dynamic variables should be getting garbage collected.

Comment by David Carlton [ 25/Nov/14 ]

Yeah; I really don't understand why they're not being garbage collected in my case, because they should indeed be short-lived. But somehow these DynamicVariable objects and the associated parsers are staying around.

Examining the heap dump, it seems like the thread object has an array of inherited thread locals which contain the DynamicVariable objects, so just looking at it from a naive point of view, it seems like the dynamic variables and the associated parsers should remain alive in perpetuity. And that's the behavior that I'm currently seeing; but it seems to me like, if that were really the case, then DynamicVariable would be very hard to use safely, so there has to be something that I'm missing. (And this actually used to work fine for me, before making what should have been a relatively minor change in the code.)

I dunno; I'll try to come up with an isolated test case so I can understand when my hundreds of thousands of not garbage collected parsers are coming from...

Comment by Jens Halm [ 25/Nov/14 ]

The current version leaks into the ThreadLocal if the top-level parser is not phrase. Maybe that was the little change you made? Whether objects are short-lived or not should not make a difference, btw.

Comment by David Carlton [ 26/Nov/14 ]

Ah, thanks for the insight, that could be related.

Comment by David Carlton [ 27/Nov/14 ]

I managed to reproduce my problem with an isolated test case; given that my problem is different from the problem that this bug was filed about, I figured the best thing to do would be to file a new bug, so I created SI-9010 with reproduction steps.

Comment by Jens Halm [ 27/Nov/14 ]

I'm not sure it is a good idea to create a new ticket. Yes, the title of this ticket is misleading, but most of the discussion in this ticket already relates to the leak and not the original problem. The new ticket suggests that it is something that requires new investigations and might lead to someone overlooking this extensive discussion.

The problem is already fully understood. From the source code it is very obvious that the ThreadLocal will leak when you don't use the phrase parser as the top level parser.

I already suggested a workaround that would minimize the problem sometime ago (see older comments). It would reduce the leak from having the entire parser hierarchy including the full input string hanging in the ThreadLocal to only leaking a None per ThreadLocal which is comparably harmless.

I just never got around to implementing this improvement because I never got feedback from project maintainers whether they'd accept such a fix and then it somehow fell from my radar.

Comment by Seth Tisue [ 18/Jul/15 ]

The parser combinators library is now community-maintained. Issues with it are now tracked at https://github.com/scala/scala-parser-combinators/issues/61 instead of here in the Scala JIRA.

Interested community members: if you consider this issue significant, feel free to open a new issue for it on GitHub, with links in both directions.

Comment by Seth Tisue [ 18/Jul/15 ]

there is a PR for this, needing review, at https://github.com/scala/scala-parser-combinators/pull/59

Generated at Mon Mar 19 17:08:58 CET 2018 using JIRA 7.0.11#70121-sha1:19d24976997c1d95f06f3e327e087be0b71f28d4.