Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Scala 2.10.0-M6
    • Fix Version/s: Scala 2.11.0-M8
    • Component/s: Collections
    • Environment:

      windows 7, x64. But should happen in all environments.

      Description

      HashSet does not implement filter directly but defers to the generic builder-based implementation of TraversableLike. That leads to filter always creating a completely new set instead of reusing the old set where possible.

      This does not only affect filter, but also methods like intersection (&) that use filter. People have a reasonable expectation that set-specific methods like intersection are as efficient as possible, but for the current implementation this is not the case.

      A few examples where this matters:

      val x=Set(1,2,3,4)
      val y = x.filter(_ => true) 
      /* this builds a completely new set, allocating a zillion temporary objects in the process. And the result does not share anything with the input even though it could share everything */
      val z = x.filterNot(_ => false)
      val w = x.intersect(x)
      
      println(x eq y) // false, should be true
      println(x eq z) // false, should be true
      println(x eq w) // false, should be true
      

      A solution for this problem would be to implement filter for HashSet and Set. I attached a patch that does that.

      (Sorry for setting this as blocking, but the JIRA user interface on my browser (chrome on windows) did not allow setting it to anything else. Feel free to change this to something more appropriate.)

      1. FilterBenchmark.scala
        1 kB
        Rüdiger Klaehn

        Issue Links

          Activity

          Hide
          Rüdiger Klaehn added a comment -

          Pull request against 2.10.x:

          https://github.com/scala/scala/pull/1096

          Show
          Rüdiger Klaehn added a comment - Pull request against 2.10.x: https://github.com/scala/scala/pull/1096
          Hide
          Rüdiger Klaehn added a comment -

          I implemented a benchmark using google caliper for the filter method.

          Run with

          java -cp scala-library.jar:caliper.jar:. FilterBenchmark
          

          The results on my development machine are as follows:

          Builder-based implementation:

             size          benchmark           ns linear runtime
                1         FilterTrue        19.68 =
                1 FilterThreeQuarter         7.17 =
                1         FilterHalf         6.99 =
                1      FilterQuarter         7.03 =
                1        FilterFalse         6.99 =
              100         FilterTrue      9405.75 =
              100 FilterThreeQuarter      7145.98 =
              100         FilterHalf      4406.39 =
              100      FilterQuarter      1721.97 =
              100        FilterFalse       407.88 =
            10000         FilterTrue   1341440.25 =
            10000 FilterThreeQuarter   1025837.20 =
            10000         FilterHalf    725372.57 =
            10000      FilterQuarter    407338.86 =
            10000        FilterFalse     64437.12 =
          1000000         FilterTrue 205794487.50 ==============================
          1000000 FilterThreeQuarter 166166961.67 ========================
          1000000         FilterHalf 108899815.71 ===============
          1000000      FilterQuarter  61353241.38 ========
          1000000        FilterFalse  15906104.10 ==
          

          Specialized implementation of filter:

             size          benchmark          ns linear runtime
                1         FilterTrue       19.13 =
                1 FilterThreeQuarter        7.04 =
                1         FilterHalf        7.01 =
                1      FilterQuarter        6.87 =
                1        FilterFalse        6.95 =
              100         FilterTrue     1800.93 =
              100 FilterThreeQuarter     2929.80 =
              100         FilterHalf     3228.58 =
              100      FilterQuarter     1982.25 =
              100        FilterFalse     2179.38 =
            10000         FilterTrue   238035.87 =
            10000 FilterThreeQuarter   375221.19 =
            10000         FilterHalf   396188.11 =
            10000      FilterQuarter   367140.59 =
            10000        FilterFalse   196032.70 =
          1000000         FilterTrue 37699210.38 =====================
          1000000 FilterThreeQuarter 49403056.20 ============================
          1000000         FilterHalf 51554796.54 ==============================
          1000000      FilterQuarter 44450213.28 =========================
          1000000        FilterFalse 34322142.71 ===================
          

          As you can see, the specialized implementation is up to a factor of 5.5 faster, not to mention the lack of temporary object creation and more structural sharing.

          The only case where the builder-based approach is faster is where the result is empty. But both implementations are quite fast in that case anyway.

          Show
          Rüdiger Klaehn added a comment - I implemented a benchmark using google caliper for the filter method. Run with java -cp scala-library.jar:caliper.jar:. FilterBenchmark The results on my development machine are as follows: Builder-based implementation: size benchmark ns linear runtime 1 FilterTrue 19.68 = 1 FilterThreeQuarter 7.17 = 1 FilterHalf 6.99 = 1 FilterQuarter 7.03 = 1 FilterFalse 6.99 = 100 FilterTrue 9405.75 = 100 FilterThreeQuarter 7145.98 = 100 FilterHalf 4406.39 = 100 FilterQuarter 1721.97 = 100 FilterFalse 407.88 = 10000 FilterTrue 1341440.25 = 10000 FilterThreeQuarter 1025837.20 = 10000 FilterHalf 725372.57 = 10000 FilterQuarter 407338.86 = 10000 FilterFalse 64437.12 = 1000000 FilterTrue 205794487.50 ============================== 1000000 FilterThreeQuarter 166166961.67 ======================== 1000000 FilterHalf 108899815.71 =============== 1000000 FilterQuarter 61353241.38 ======== 1000000 FilterFalse 15906104.10 == Specialized implementation of filter: size benchmark ns linear runtime 1 FilterTrue 19.13 = 1 FilterThreeQuarter 7.04 = 1 FilterHalf 7.01 = 1 FilterQuarter 6.87 = 1 FilterFalse 6.95 = 100 FilterTrue 1800.93 = 100 FilterThreeQuarter 2929.80 = 100 FilterHalf 3228.58 = 100 FilterQuarter 1982.25 = 100 FilterFalse 2179.38 = 10000 FilterTrue 238035.87 = 10000 FilterThreeQuarter 375221.19 = 10000 FilterHalf 396188.11 = 10000 FilterQuarter 367140.59 = 10000 FilterFalse 196032.70 = 1000000 FilterTrue 37699210.38 ===================== 1000000 FilterThreeQuarter 49403056.20 ============================ 1000000 FilterHalf 51554796.54 ============================== 1000000 FilterQuarter 44450213.28 ========================= 1000000 FilterFalse 34322142.71 =================== As you can see, the specialized implementation is up to a factor of 5.5 faster, not to mention the lack of temporary object creation and more structural sharing. The only case where the builder-based approach is faster is where the result is empty. But both implementations are quite fast in that case anyway.
          Hide
          Josh Suereth added a comment -

          One thing this benchmark does not test is what happens when the call-site becomes megamorphic. You should run the test after warming up with other Set implementations to see what final performance is. My guess is we loose a bit when the JIT can't inline the call site.

          Show
          Josh Suereth added a comment - One thing this benchmark does not test is what happens when the call-site becomes megamorphic. You should run the test after warming up with other Set implementations to see what final performance is. My guess is we loose a bit when the JIT can't inline the call site.
          Hide
          Adriaan Moors added a comment -

          Unassigning and rescheduling to M6 as previous deadline was missed.

          Show
          Adriaan Moors added a comment - Unassigning and rescheduling to M6 as previous deadline was missed.
          Show
          Rex Kerr added a comment - https://github.com/scala/scala/pull/3318

            People

            • Assignee:
              Rex Kerr
              Reporter:
              Rüdiger Klaehn
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development