Scala Programming Language
  1. Scala Programming Language
  2. SI-5879

scala.collection.immutable.HashMap.merge does not handle collisions correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Scala 2.9.2
    • Fix Version/s: Scala 2.10.0-M3
    • Component/s: Collections
    • Labels:
      None
    • Environment:

      Windows 7, 64bit

      Description

      The merge function of scala.collections.immutable.HashMap can result in an inconsistent HashMap when there is a hash code collision. The easiest way to get this behavior is to merge two maps and defining a collision function.

      But this also occasionally happens when no collision function is provided!

      
          import collection.immutable.HashMap
          val a = HashMap(1 -> "1")
          val b = HashMap(1 -> "2")
          def collision(a: (Int, String), b: (Int, String)) = {
            // shouldn't this be called with reversed order?
            println(a)
            a
          }
          val r = a.merge(b, collision)
          println(r)
          println(r(1))
      
      

      Output:

          (1,2)
          Map(1 -> 2)
          1
      

      Explanation:

      HashMap1 caches both the value and a key/value pair. As a result of the merge the value field is inconsistent to the kv field. When printing the map, the kv field is used, so the result is correct. When calling apply, the value field is used, and the result is wrong.

      Expected behavior:

      HashMap1.value and HashMap1.kv should never be able to become inconsistent. To fix this in HashMap1.updated0:

            if (hash == this.hash && key == this.key ) {
              val newKv = if (merger eq null) kv else merger(this.kv, kv)
              new HashMap1(key, hash, newKv ._2, newKv)
            } else {
              ...
      

      But there might be other places where this issue (inconsistency between kv and value) occurs.

        Issue Links

          Activity

          Hide
          Rüdiger Klaehn added a comment -

          There is a bug in the fix. Or at least the behavior of the merged method if the merger is null is pretty nondeterministic and thus useless.

          Given two maps a and b, a.merged(b)(null) should only take elements from b if there is no collision, right? So it should be equivalent with b ++ a, where all elements in a replace elements in b. But it is not. See test case.

          The reason is the following: if a merger is given, the number of times invert has been called on it keeps track of the number of times the order of the arguments has been reversed. It essentially acts as a flag that is negated every time the arguments are swapped, like e.g. in

              protected override def merge0[B1 >: B](that: HashMap[A, B1], level: Int, merger: Merger[A, B1]): HashMap[A, B1] = {
                that.updated0(key, hash, level, value, kv, if (merger ne null) merger.invert else null)
              }
          

          But when the merger is null there is no way to keep track of the number of flips, so whether you get something from the first or the second map depends on the structure of the map (depth, number of collisions, etc).

          Here is some code to reproduce the bug:

            val n = 10
            val keys = (0 until n)
            val sa = keys.map(key => key -> "a")
            val sb = keys.map(key => key -> "b")
            for (i <- 0 until n) {
              val a = HashMap(sa: _*).take(i)
              val b = HashMap(sb: _*).take(n-i)
          
              val r1 = a ++ b
              val r2 = b.merged(a)(null)
          
              if (r1 != r2) {
                println(a)
                println(b)
                println(r1)
                println(r2)
              }
              println(r1 == r2)
            }
          

          I would suggest dropping the merger stuff and just having an additional boolean argument "flipped" that just keeps track of whether the order of the arguments was flipped. That way it works even if the merger is null.

          I think that would be faster as well since it would not require lifting the function to some helper object.

          Show
          Rüdiger Klaehn added a comment - There is a bug in the fix. Or at least the behavior of the merged method if the merger is null is pretty nondeterministic and thus useless. Given two maps a and b, a.merged(b)(null) should only take elements from b if there is no collision, right? So it should be equivalent with b ++ a, where all elements in a replace elements in b. But it is not. See test case. The reason is the following: if a merger is given, the number of times invert has been called on it keeps track of the number of times the order of the arguments has been reversed. It essentially acts as a flag that is negated every time the arguments are swapped, like e.g. in protected override def merge0[B1 >: B](that: HashMap[A, B1], level: Int, merger: Merger[A, B1]): HashMap[A, B1] = { that.updated0(key, hash, level, value, kv, if (merger ne null) merger.invert else null) } But when the merger is null there is no way to keep track of the number of flips, so whether you get something from the first or the second map depends on the structure of the map (depth, number of collisions, etc). Here is some code to reproduce the bug: val n = 10 val keys = (0 until n) val sa = keys.map(key => key -> "a") val sb = keys.map(key => key -> "b") for (i <- 0 until n) { val a = HashMap(sa: _*).take(i) val b = HashMap(sb: _*).take(n-i) val r1 = a ++ b val r2 = b.merged(a)(null) if (r1 != r2) { println(a) println(b) println(r1) println(r2) } println(r1 == r2) } I would suggest dropping the merger stuff and just having an additional boolean argument "flipped" that just keeps track of whether the order of the arguments was flipped. That way it works even if the merger is null. I think that would be faster as well since it would not require lifting the function to some helper object.
          Hide
          Rüdiger Klaehn added a comment -

          See comment

          Show
          Rüdiger Klaehn added a comment - See comment
          Hide
          Rüdiger Klaehn added a comment -

          Added a pull request. Not sure if this is the best solution, but it should be reasonably efficient. When the merge function is null, it gets lifted to a default merger that just takes the first argument.

          https://github.com/scala/scala/pull/1108

          Show
          Rüdiger Klaehn added a comment - Added a pull request. Not sure if this is the best solution, but it should be reasonably efficient. When the merge function is null, it gets lifted to a default merger that just takes the first argument. https://github.com/scala/scala/pull/1108

            People

            • Assignee:
              Aleksandar Prokopec
              Reporter:
              Rüdiger Klaehn
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development