Scala Programming Language
  1. Scala Programming Language
  2. SI-4828

indexOfSlice has horrible performance for arrays and lists

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: Scala 2.9.0
    • Fix Version/s: None
    • Component/s: Misc Library
    • Labels:
      None
    • Environment:

      Description

      indexOfSlice is implemented using operations that have horrible performance characteristics for mutable data structures, including Array, and for slow-to-index data structures, including List.

      One problem (I'm not sure if it's the only problem) is that SeqLike's indexOf method (in the SeqLike companion) is used, which blindly uses drop and slice even if you're not actually slicing anything, then launches into an index-based searcher that shouldn't need the sliced data (and is slow as molasses for List anyway). So, for arrays you make a bunch of extra copies when none are needed, and for lists, you traverse the list a bunch of times when this is not needed. (lastIndexOfSlice is even worse, because it reverses the entire data structure instead of searching backwards.)

      The changes needed are:

      (1) Move the search logic from SeqLike.KMP into SeqLike.indexOf, so you can use indexing without copying, or at least use views.

      (2) Move the SeqLike stuff into IndexedSeqOptimized and implement the routines differently for SeqLike (probably with an iterator on the main collection).

      (3) Write a backwards-KMP or at least use views so you don't need to copy huge data structures to see something at the end.

      I will write a patch if given a specific go-ahead and/or directions on what is considered acceptable and what is not. I'm actually too busy right now to write a patch, but will make an exception (i.e. get that much less sleep) if I am reasonably confident it will be included and benefit me and others in a reasonable timeframe.

        Activity

        Hide
        Paul Phillips added a comment -

        Anything which retains interface compatibility and is "better" is a win. It doesn't sound like what you want to do is controversial; if there is a big other side of the coin, let me know, otherwise you should write the patch and it will be included and benefit you and others in a reasonable time frame.

        Show
        Paul Phillips added a comment - Anything which retains interface compatibility and is "better" is a win. It doesn't sound like what you want to do is controversial; if there is a big other side of the coin, let me know, otherwise you should write the patch and it will be included and benefit you and others in a reasonable time frame.
        Hide
        Paul Phillips added a comment -

        I would assign it to you, but jira says I can't do that, so I assigned it to myself, but I'm considering it yours until you say otherwise.

        Show
        Paul Phillips added a comment - I would assign it to you, but jira says I can't do that, so I assigned it to myself, but I'm considering it yours until you say otherwise.
        Hide
        Paul Phillips added a comment -

        I added you to the developer group. Now you are an assignation target. Hide!

        Show
        Paul Phillips added a comment - I added you to the developer group. Now you are an assignation target. Hide!
        Hide
        Rex Kerr added a comment -

        All right. I assume I'm supposed to fix trunk, and we'll leave 2.[89].x to be slow (or someone else can backport?).

        The only downside I'm aware of is more lines of code.

        Show
        Rex Kerr added a comment - All right. I assume I'm supposed to fix trunk, and we'll leave 2. [89] .x to be slow (or someone else can backport?). The only downside I'm aware of is more lines of code.
        Hide
        Rex Kerr added a comment -

        Argh, ninjas already!

        ArrayStack should be IndexedSeqOptimized but is not even IndexedSeq. I guess I'll leave this alone for now given that I doubt a lot of people want really fast indexOfSlice on ArrayStack.

        Show
        Rex Kerr added a comment - Argh, ninjas already! ArrayStack should be IndexedSeqOptimized but is not even IndexedSeq. I guess I'll leave this alone for now given that I doubt a lot of people want really fast indexOfSlice on ArrayStack.
        Hide
        Rex Kerr added a comment -

        This is a patch against r25464 that fixes the issue. List is no longer used for indexing, and indexing is used instead of slicing to get sub-sequences. The behavior of indexOfSlice and lastIndexOfSlice is unchanged; I have not tested every corner case for SeqLike.indexOf, but some of those will likely be different (since they didn't seem to make sense anyway--if you look for an empty slice, you can return an element completely out of range of the collection?).

        Show
        Rex Kerr added a comment - This is a patch against r25464 that fixes the issue. List is no longer used for indexing, and indexing is used instead of slicing to get sub-sequences. The behavior of indexOfSlice and lastIndexOfSlice is unchanged; I have not tested every corner case for SeqLike.indexOf, but some of those will likely be different (since they didn't seem to make sense anyway--if you look for an empty slice, you can return an element completely out of range of the collection?).
        Hide
        Paul Phillips added a comment -

        Do you mean like the fact that "(0 to 20).indexOfSlice(Nil, 100)" returns 100? Do you have a different suggestion? It could be -1, but the case would have to be spectacular to change it to failure.

        Show
        Paul Phillips added a comment - Do you mean like the fact that "(0 to 20).indexOfSlice(Nil, 100)" returns 100? Do you have a different suggestion? It could be -1, but the case would have to be spectacular to change it to failure.
        Hide
        Commit Message Bot added a comment -

        (extempore in r25483) Optimizations for Seq's implementations of sequence search algorithms.
        Contributed by Rex Kerr. Closes SI-4828, no review.

        Show
        Commit Message Bot added a comment - (extempore in r25483 ) Optimizations for Seq's implementations of sequence search algorithms. Contributed by Rex Kerr. Closes SI-4828 , no review.

          People

          • Assignee:
            Rex Kerr
            Reporter:
            Rex Kerr
          • Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development