Scala Programming Language
  1. Scala Programming Language
  2. SI-5641

Predef.arrayToCharSequence.subSequence copies arrays around leading to bad performance in code relying on subSequence being fast

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Scala 2.9.1
    • Fix Version/s: Scala 2.10.0
    • Component/s: Misc Library
    • Labels:
      None

      Description

      The concrete problem is that CharArrayReader's constructor implicitly uses Predef.arrayToCharSequence which implements CharSequence.subSequence by calling Array.slice on the underlying array which creates a new copy of the array containing the subsequence. Contrast this with String.subSequence which creates a shallow copy with only updated indices.

      Using a CharArrayReader in parser combinators can therefore lead to really bad performance issues which can't be figured out easily.

      The next question is if Predef.arrayToCharSequence really should implement subsequence by making a copy. The problem is that java.lang.CharSequence isn't explicit in what to expect from ``subSequence`` if the underlying data structure is mutable.

        Activity

        Hide
        Johannes Rudolph added a comment -

        Martin's comment on that:

        On Tue, Apr 3, 2012 at 2:25 AM, martin odersky <martin.odersky@epfl.ch> wrote:
        > On Mon, Apr 2, 2012 at 3:51 AM, Johannes Rudolph
        >> 1.) Create a shallow copy without copying in `subSequence`. This may
        >> break code which relies on the `CharSequence` returned from
        >> `subSequence` to be unchanged if the underlying array is mutated.
        >
        > I would vote for 1). Since the arrayToCharSequence does not copy the whole
        > array when converting to a CharSequence, any arguments based on mutability
        > are moot. So I would classify this as a (performance) bug, which needs to be
        > fixed.
        
        Show
        Johannes Rudolph added a comment - Martin's comment on that: On Tue, Apr 3, 2012 at 2:25 AM, martin odersky <martin.odersky@epfl.ch> wrote: > On Mon, Apr 2, 2012 at 3:51 AM, Johannes Rudolph >> 1.) Create a shallow copy without copying in `subSequence`. This may >> break code which relies on the `CharSequence` returned from >> `subSequence` to be unchanged if the underlying array is mutated. > > I would vote for 1). Since the arrayToCharSequence does not copy the whole > array when converting to a CharSequence, any arguments based on mutability > are moot. So I would classify this as a (performance) bug, which needs to be > fixed.
        Hide
        Paul Phillips added a comment -

        b5e9e4b950

        Show
        Paul Phillips added a comment - b5e9e4b950

          People

          • Assignee:
            Paul Phillips
            Reporter:
            Johannes Rudolph
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development