Uploaded image for project: 'Scala Programming Language'
  1. Scala Programming Language
  2. SI-7710

RegexParsers.scala has O(inputlength) memory performance on java >= 7u6

    Details

      Description

      From 1.7.0_06 onwards, String.substring() (and .subSequence) was changed to no longer re-use the internal char[] data, but make a copy instead. Since RegexParsers.scala:109 calls subSequence() for every character parsed, it now effectively re-allocates the whole remaining parse content for every parse step.

      This shows in horrible parse performance and GC for parsing a 3MB file using https://github.com/ngocdaothanh/scaposer , which would parse almost instantly in Java 6.

      Details on the changes to java.lang.String are mentioned here:
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6924259
      http://java-performance.info/changes-to-string-java-1-7-0_06/
      http://grokbase.com/t/gg/scala-user/132v5z1678/performance-of-javatokenparsers-with-java7

      I guess one way around it would be wrapping CharSequence in a simple buffer, that does re-use the underlying CharSequence, adding in skip/count fields that maintain the current position.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gourlaysama Antoine Gourlay
                Reporter:
                jypma Jan Ypma
              • Votes:
                4 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: