Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested extractors generate exponential-space bytecode #1133

Closed
scabug opened this issue Jul 25, 2008 · 22 comments
Closed

nested extractors generate exponential-space bytecode #1133

scabug opened this issue Jul 25, 2008 · 22 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Jul 25, 2008

Pattern matching multiple cases against extractors triggers a combination explosion that quickly overwhelms the maximum method size. This has bitten me quite hard on my in-progress java to scala translator, as I use extractors against the JDT for parsing java source. I have attached a demonstration. You can make a minor tweak to the attached source and see the size of the bytecode drop in half (or double again, if FJBG wouldn't choke.)

leaf:dfascala paulp$$ scalac *.scala
Exception in thread "main" java.lang.Error: ch.epfl.lamp.fjbg.JCode$$OffsetTooBigException: offset too big to fit in 16 bits: 36580
at ch.epfl.lamp.fjbg.JFieldOrMethod.writeTo(JFieldOrMethod.java:114)
[...]

The disassembled bytecode reveals enormous nested if-else statements with massive duplicated code.

@scabug
Copy link
Author

scabug commented Jul 25, 2008

Imported From: https://issues.scala-lang.org/browse/SI-1133?orig=1
Reporter: @paulp
Attachments:

  • Match.scala (created on Jul 25, 2008 2:03:01 PM UTC, 1000 bytes)

@scabug
Copy link
Author

scabug commented Oct 26, 2008

@DRMacIver said:
This seems unavoidable without significant changes to the pattern matching scheme. I have in mind some significant changes that might do it, but I'm not going to be implementing those any time soon. Don't hold your breath on this one. :-)

@scabug
Copy link
Author

scabug commented Mar 7, 2009

@rjolly said:
I have stumbled on this problem. I am working on a computer algebra project, and I intended to do pattern matching on MathML expressions. Here is my code:

import scala.xml.Elem

def f(expr: Elem) = expr match {
    case <apply><plus/><apply><plus/>{x: Elem}{y: Elem}</apply>{z: Elem}</apply> => <apply><plus/>{x}<apply><plus/>{y}{z}</apply></apply>
    case <apply><plus/><apply><minus/>{x: Elem}{y: Elem}</apply>{z: Elem}</apply> => <apply><minus/>{x}<apply><minus/>{y}{z}</apply></apply>

    case <apply><plus/>{x: Elem}<apply><plus/>{y: Elem}{z: Elem}</apply></apply> if (x == y) => <apply><plus/><apply><times/><cn>2</cn>{x}</apply>{z}</apply>
    case <apply><plus/>{x: Elem}<apply><minus/>{y: Elem}{z: Elem}</apply></apply> if (x == y) => <apply><minus/><apply><times/><cn>2</cn>{x}</apply>{z}</apply>

    case <apply><plus/>{x: Elem}{y: Elem}</apply> if (x == y) => <apply><times/><cn>2</cn>{x}</apply>

    case <apply><minus/><apply><plus/>{x: Elem}{y: Elem}</apply>{z: Elem}</apply> => <apply><plus/>{x}<apply><minus/>{y}{z}</apply></apply>
    case <apply><minus/><apply><minus/>{x: Elem}{y: Elem}</apply>{z: Elem}</apply> => <apply><minus/>{x}<apply><plus/>{y}{z}</apply></apply>

    case <apply><minus/>{x: Elem}<apply><plus/>{y: Elem}{z: Elem}</apply></apply> if (x == y) => <apply><minus/>{z}</apply>
    case <apply><minus/>{x: Elem}<apply><minus/>{y: Elem}{z: Elem}</apply></apply> if (x == y) => z

    case <apply><minus/>{x: Elem}{y: Elem}</apply> if (x == y) => <cn>0</cn>

    case x => x
  }

I get the same error:

java.lang.Error: ch.epfl.lamp.fjbg.JCode$$OffsetTooBigException: offset too big to fit in 16 bits: 33190

This is more or less the minimal set of case statements that triggers the error. If I remove one or two arbitrary ones, it compiles again. Of course I need many more to achieve what I intend, so this defect is very blocking to me.

@scabug
Copy link
Author

scabug commented Mar 7, 2009

@paulp said:
When I reported this bug I was just a scala user, now I work on the scala compiler and have a good understanding of this bug. Unfortunately, as DRMacIver says above, remedying this will require a significant reworking of the code generation strategy. We have a reasonably good handle on the problems with the pattern matcher now, but they aren't easily tractable to bandaids.

This should not fundamentally be a blocker though, because you can work around it simply by breaking up the pattern matches into more units. For instance you could follow however many cases with a default case which passes the scrutinee a new method that continues the pattern match. I realize this is annoying, but at least it's straightforward.

@scabug
Copy link
Author

scabug commented May 22, 2009

@rjolly said:
This is right, I was able to work it around as you said. There is however an added brain effort to determine which cases are subcases of which other cases, so this is still worth fixing I think.

@scabug
Copy link
Author

scabug commented May 22, 2009

@DRMacIver said:
Well, yes. There's no question that this needs fixing: It obviously does. But it's a thorny issue, so I wouldn't recommend holding your breath.

@scabug
Copy link
Author

scabug commented Jan 27, 2010

@paulp said:
See also #2516.

@scabug
Copy link
Author

scabug commented Feb 10, 2010

Kieron Wilkinson (kieron) said:
Just to say that I've just run into this problem too on the Scala 2.8 beta 1, while implementing a (I thought simple) tokeniser of some input. I doubt it helps, but just in case it does, my code follows, with as much pulled out as possible in order to reproduce the problem:

import scala.util.matching.Regex

object Tokeniser {
  val prefix = """[0-9a-f]{8}:[\s]+"""
  val StatusRegex    = new Regex(prefix + """status=0""")
  val ResetRegex     = new Regex(prefix + """reset=0""")
  val DensityRegex   = new Regex(prefix + """density=(\d)""")
  val MinTrackRegex  = new Regex(prefix + """min_track=(\d+)""")
  val MaxTrackRegex  = new Regex(prefix + """max_track=(\d+)""")
  val MotorRegex     = new Regex(prefix + """motor=([0-1])""")
  val TrackRegex     = new Regex(prefix + """track=(\d+)""")
  val SideRegex      = new Regex(prefix + """side=([0-1])""")
  val StreamRegex    = new Regex(prefix + """stream=([0-1])""")

  def go(text: String): String = text.trim match {
    case StatusRegex() => ""
    case ResetRegex() => ""
    case DensityRegex(density) => ""
    case MinTrackRegex(track) => ""
    case MaxTrackRegex(track) => ""
    case MotorRegex(on) => ""
    case TrackRegex(track) => ""
    case SideRegex(side) => ""
    case StreamRegex(on) => ""
  }
}

And the exception I get is:

Exception in thread "main" java.lang.Error: ch.epfl.lamp.fjbg.JCode$$OffsetTooBigException: offset to
o big to fit in 16 bits: 55087
        at ch.epfl.lamp.fjbg.JFieldOrMethod.writeTo(JFieldOrMethod.java:114)
        at ch.epfl.lamp.fjbg.JClass.writeTo(JClass.java:315)
        at scala.tools.nsc.backend.jvm.GenJVM$$BytecodeGenerator.emitClass(GenJVM.scala:150)
        at scala.tools.nsc.backend.jvm.GenJVM$$BytecodeGenerator.genClass(GenJVM.scala:241)
        at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase.apply(GenJVM.scala:57)
        at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase$$$$anonfun$$run$$3.apply(GenJVM.scala:53)
        at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase$$$$anonfun$$run$$3.apply(GenJVM.scala:53)
        at scala.collection.Iterator$$class.foreach(Iterator.scala:582)
        at scala.collection.mutable.HashMap$$$$anon$$4.foreach(HashMap.scala:82)
        at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase.run(GenJVM.scala:53)
        at scala.tools.nsc.Global$$Run.compileSources(Global.scala:749)
        at scala.tools.nsc.Global$$Run.compile(Global.scala:839)
        at scala.tools.nsc.Main$$.process(Main.scala:110)
        at scala.tools.nsc.Main$$.main(Main.scala:124)
        at scala.tools.nsc.Main.main(Main.scala)
Caused by: ch.epfl.lamp.fjbg.JCode$$OffsetTooBigException: offset too big to fit in 16 bits: 55087
        at ch.epfl.lamp.fjbg.JCode.checkOffset16(JCode.java:896)
        at ch.epfl.lamp.fjbg.JCode.patchAllOffset(JCode.java:975)
        at ch.epfl.lamp.fjbg.JCode.freeze(JCode.java:95)
        at ch.epfl.lamp.fjbg.JMethod.freeze(JMethod.java:81)
        at ch.epfl.lamp.fjbg.JFieldOrMethod.writeTo(JFieldOrMethod.java:111)
        ... 14 more

If I remove any of the cases, it works fine, so I will try to break it up as suggested. I did have a lot more cases to add too :)

@scabug
Copy link
Author

scabug commented Aug 10, 2010

Alex Black (alexblack) said:
I've just run into this issue with Scala 2.8 final. I haven't yet been able to identify which code in my library is causing the problem - is there a way that scalac can let us know which code is causing the issue?

java.lang.Error: ch.epfl.lamp.fjbg.JCode$$OffsetTooBigException: offset too big to fit in 16 bits: 36197
at ch.epfl.lamp.fjbg.JFieldOrMethod.writeTo(JFieldOrMethod.java:114)
at ch.epfl.lamp.fjbg.JClass.writeTo(JClass.java:315)
at scala.tools.nsc.backend.jvm.GenJVM$$BytecodeGenerator.emitClass(GenJVM.scala:141)
at scala.tools.nsc.backend.jvm.GenJVM$$BytecodeGenerator.genClass(GenJVM.scala:262)
at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase.apply(GenJVM.scala:56)
at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase$$$$anonfun$$run$$3.apply(GenJVM.scala:52)
at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase$$$$anonfun$$run$$3.apply(GenJVM.scala:52)
at scala.collection.mutable.HashMap$$$$anon$$2$$$$anonfun$$foreach$$3.apply(HashMap.scala:89)
at scala.collection.mutable.HashMap$$$$anon$$2$$$$anonfun$$foreach$$3.apply(HashMap.scala:89)
at scala.collection.Iterator$$class.foreach(Iterator.scala:631)
at scala.collection.mutable.HashTable$$$$anon$$1.foreach(HashTable.scala:161)
at scala.collection.mutable.HashTable$$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap$$$$anon$$2.foreach(HashMap.scala:89)
at scala.tools.nsc.backend.jvm.GenJVM$$JvmPhase.run(GenJVM.scala:52)
at scala.tools.nsc.Global$$Run.compileSources(Global.scala:733)
at scala.tools.nsc.Global$$Run.compileFiles(Global.scala:803)
at scala.tools.nsc.interactive.RefinedBuildManager.update0$$1(RefinedBuildManager.scala:123)
at scala.tools.nsc.interactive.RefinedBuildManager.update(RefinedBuildManager.scala:182)
at scala.tools.nsc.interactive.RefinedBuildManager.update(RefinedBuildManager.scala:92)
at scala.tools.eclipse.EclipseBuildManager.build(EclipseBuildManager.scala:120)
at scala.tools.eclipse.ScalaProject.build(ScalaProject.scala:421)
at scala.tools.eclipse.ScalaBuilder.build(ScalaBuilder.scala:87)
at org.eclipse.core.internal.events.BuildManager$$2.run(BuildManager.java:627)
at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
at org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:170)
at org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:201)
at org.eclipse.core.internal.events.BuildManager$$1.run(BuildManager.java:253)
at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
at org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:256)
at org.eclipse.core.internal.events.BuildManager.basicBuildLoop(BuildManager.java:309)
at org.eclipse.core.internal.events.BuildManager.build(BuildManager.java:341)
at org.eclipse.core.internal.events.AutoBuildJob.doBuild(AutoBuildJob.java:140)
at org.eclipse.core.internal.events.AutoBuildJob.run(AutoBuildJob.java:238)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Caused by: ch.epfl.lamp.fjbg.JCode$$OffsetTooBigException: offset too big to fit in 16 bits: 36197
at ch.epfl.lamp.fjbg.JCode.checkOffset16(JCode.java:896)
at ch.epfl.lamp.fjbg.JCode.patchAllOffset(JCode.java:975)
at ch.epfl.lamp.fjbg.JCode.freeze(JCode.java:95)
at ch.epfl.lamp.fjbg.JMethod.freeze(JMethod.java:81)
at ch.epfl.lamp.fjbg.JFieldOrMethod.writeTo(JFieldOrMethod.java:111)
... 34 more

@scabug
Copy link
Author

scabug commented Aug 26, 2010

tomer.doron said:
when doing simple REST routing using pattern matching this issue come up very quickly, as such, hinders the usability of lift as a REST server in a serious manner, for example a RestHelper.serveJx can only hold up to 3 case statements

@scabug
Copy link
Author

scabug commented Dec 8, 2010

Willis Blackburn (willisblackburn) said:
I just noticed this one while researching various ways of pattern-matching URLs in Scala. It's a pretty serious issue to me: I will inevitably come across because pattern matching is a standout feature of Scala, and I use it a lot. I understand that there's a workaround, however the workaround is needed after only 8 or 9 cases, which may not be the typical situation but is also not an edge case. If the work around is to easy for the human programmer to implement, why can't the compiler just do the same thing?

@scabug
Copy link
Author

scabug commented Dec 8, 2010

@paulp said:
Replying to [comment:20 willisblackburn]:

I just noticed this one while researching various ways of pattern-matching URLs in Scala. It's a pretty serious issue to me: I will inevitably come across because pattern matching is a standout feature of Scala, and I use it a lot. I understand that there's a workaround, however the workaround is needed after only 8 or 9 cases, which may not be the typical situation but is also not an edge case. If the work around is to easy for the human programmer to implement, why can't the compiler just do the same thing?

What is missing are not ideas, but implementations.

@scabug
Copy link
Author

scabug commented Jan 2, 2011

Willis Blackburn (willisblackburn) said:
Replying to [comment:21 extempore]:

What is missing are not ideas, but implementations.

That's good to hear.

What would happen if the compiler silently translated any code of the form

x match {
  case p1 => r1;
  case p2 => r2;
  ...
}

into

{code}
x match {
case p1 => r1;
case _ => x match {
case p2 => r2;
case _ => x match {
...
}
}
}

?

Would that destroy the performance of the match operation? Maybe it would create lots of new function objects?

@scabug
Copy link
Author

scabug commented Jan 2, 2011

@paulp said:
Replying to [comment:22 willisblackburn]:

Replying to [comment:21 extempore]:

What is missing are not ideas, but implementations.

That's good to hear.

What would happen if the compiler silently translated any code of the form

You can try it out if you like, I implemented that a year ago. It's under -Ypmat-naive. It's not viable for general usage, but I don't remember every reason. Here's what the source comment says.

  /** For debugging only.  Desugar a match statement like so:
   *  val x = scrutinee
   *  x match {
   *    case case1 => ...
   *    case _ => x match {
   *       case case2 => ...
   *       case _ => x match ...
   *    }
   *  }
   * 
   *  This way there are never transitions between nontrivial casedefs.
   *  Of course many things break: exhaustiveness and unreachable checking
   *  do not work, no switches will be generated, etc.
   */

@scabug
Copy link
Author

scabug commented Apr 19, 2011

Dustin Whitney (dwhitney) said:
I'd love to see a resolution to this in the next release. I encounter this problem with some very simple looking code:

def intent = {
case GET(Path("/v1/lists")) => Ok
case PUT(Path("/v1/list")) => Ok
case POST(Path(Seg("v1" :: "list" :: name :: Nil))) => Ok
case DELETE(Path(Seg("v1" :: "list" :: name :: Nil))) => Ok
}

That code is not in need of refactoring or ugly at all. In fact this issue makes me take things that are otherwise understandable and break them apart into things that are less understandable.

@scabug
Copy link
Author

scabug commented Nov 1, 2011

Toni Menzel (tonit) said:
Is there any progress on this ? It hits me really hard currently (using 2.9.1)

@scabug
Copy link
Author

scabug commented Nov 1, 2011

Toni Menzel (tonit) said:
okay its also okay for me using the sub-match workaround.
I have limited scala experience (yet) but i wonder if we could pack the workaround solution (that Paul described) as a library so that users who hit this issue can still write compact scala code. once the issue can be resolved, the library solution - which might be inefficient to serve as a general solution - becomes obsolete.

@scabug
Copy link
Author

scabug commented Nov 1, 2011

@dcsobral said (edited on Nov 1, 2011 11:43:46 AM UTC):

dcs@ayanami:~/tmp$ scalac -Yvirtpatmat  Tokeniser.scala 
dcs@ayanami:~/tmp$ 

Yay!

@scabug
Copy link
Author

scabug commented Nov 1, 2011

@dcsobral said:
For the record, I can also compile the original example. Also, I have tested the virtual pattern matcher against the regex example with the code below, and it passes. So, not only it doesn't crash, but it matches correctly so far. Thanks, Adriaan!

object TokeniserTest extends Properties("Tokeniser") {                                                          
  val hexDigit = Gen oneOf (('0' to '9') ++ ('a' to 'f'))                                                       
  val identifier = Gen listOfN (8, hexDigit) map (_.mkString)                                                   
  val command0 = Gen oneOf ("status", "reset") map (_ + "=0")                                                   
  val command01 = for {                                                                                         
    cmd <- Gen oneOf ("motor", "side", "stream")                                                                
    digit <- Gen oneOf ("0", "1")                                                                               
  } yield cmd + "=" + digit                                                                                     
  val commandX = Gen oneOf ('0' to '9') map ("density="+_)                                                      
  val commandN = for {                                                                                          
    cmd <- Gen oneOf ("min_track", "max_track", "track")                                                        
    number <- Gen.numStr suchThat (_.size > 0)                                                                  
  } yield cmd + "=" + number                                                                                    
  val textGen = for {                                                                                           
    i <- id                                                                                                     
    cmd <- Gen oneOf (command0, command01, commandX, commandN)                                                  
  } yield i + ": " + cmd                                                                                        
                                                                                                                
                                                                                                                
  property("Parsing") = forAllNoShrink(textGen) ( text => Tokeniser.go(text)                                    
    matches """(status|reset|density \d+|(min |max )?track \d+|(motor|stream) (on|off)|side \d+)""" )           
}

@scabug
Copy link
Author

scabug commented Nov 23, 2011

@adriaanm said:
sweet! now let's see if we can speed this up a little...
I'm getting started with the next phase of project VPM: run-time performance
I really appreciate -Yvirtpatmat bug reports and benchmark/test cases!
please assign them directly to me or just send me an email

@scabug
Copy link
Author

scabug commented Feb 12, 2012

Matic Potočnik (hairyfotr) said:
Works fine for me with both 2.10.0-M1 -Yvirtpatmat, and 2.9.1. -Ypmat-naive.

The code I had trouble with is way too messy to post, but I also confirm it works with Kierons case.

@scabug
Copy link
Author

scabug commented May 12, 2012

@retronym said:
Test case pending merge:

scala/scala#542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants