Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

scabug · 2013-08-07T17:39:26Z

repeated ++ of a single-item collection onto a Vector is outrageously slow compared to repeatedly adding one item with + :

Welcome to Scala version 2.10.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0-ea).

scala> import System.{currentTimeMillis => millis}
import System.{currentTimeMillis=>millis}

scala> def time[T](f: => T): Long = { val start = millis; f; millis - start }
time: [T](f: => T)Long

scala> def fast(n: Int) = (1 to n).foldLeft(Vector[Int]())(_ :+ _)
fast: (n: Int)scala.collection.immutable.Vector[Int]

scala> def slow(n: Int) = (1 to n).foldLeft(Vector[Int]())((acc, a) => acc ++ List(a))
slow: (n: Int)scala.collection.immutable.Vector[Int]

scala> Stream.iterate(1)(_ * 10).map(n => time(fast(n))).take(6).force
res0: scala.collection.immutable.Stream[Long] = Stream(1, 0, 1, 4, 48, 77)

scala> Stream.iterate(1)(_ * 10).map(n => time(slow(n))).take(6).force
res1: scala.collection.immutable.Stream[Long] = Stream(1, 0, 1, 30, 974, 88617)

Paul Chiusano pointed this out over at #4442, where he comments: "Since Vector has a constant time snoc operation, it seems like the default implementation of ++ should be repeated calls to :+, which doesn't require any internal changes to Vector."

The text was updated successfully, but these errors were encountered:

scabug · 2013-08-07T17:39:26Z

Imported From: https://issues.scala-lang.org/browse/SI-7725?orig=1
Reporter: @SethTisue
Affected Versions: 2.10.2
See #4442

scabug · 2013-08-07T17:39:42Z

@SethTisue said:
This seems like a critical bug to me, and I think a fix would merit backporting to 2.10.x.

scabug · 2013-08-13T17:28:56Z

@adriaanm said:
Agreed. Let's start by fixing it in 2.11. Backport should be easy.

scabug · 2013-08-13T18:07:14Z

@non said:
I wonder, is there a length above which the current ++ implementation makes sense? If not then just repeating calls to :+ makes sense. I could imagine that for concatenating two very large vectors the current implementation is faster though.

This should be easy to test... I'll do it myself if I have time to get to it.

scabug · 2013-08-13T19:08:35Z

@SethTisue said (edited on Aug 13, 2013 7:09:06 PM UTC):
If the first vector is very short and the second vector is very long, both old code and proposed new code will run in O(n) time, but yes, the new code will have worse constant factors:

import System.{currentTimeMillis => millis}
def time[T](f: => T): Long = { val start = millis; f; millis - start }
def repeat[T](n: Int)(body: => T): Unit = for(_ <- 0 until n) body
def fix[T](v1: Vector[T], v2: Vector[T]): Vector[T] = v2.foldLeft(v1)(_ :+ _)
val big = Vector.range(1,100000)
val small = Vector(1)
def test() {
  println("before fix")
  println(time(repeat(100)(big ++ big)))
  println(time(repeat(100)(big ++ small)))
  println(time(repeat(100)(small ++ big)))
  println("after fix")
  println(time(repeat(100)(fix(big, big))))
  println(time(repeat(100)(fix(big, small))))
  println(time(repeat(100)(fix(small, big))))
}

Welcome to Scala version 2.10.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25)

...
scala> test()
before fix
529
213
217
after fix
586
0
560

But I don't think this should bother us more than slightly. I think getting asymptotic speedup on big ++ small is clearly worth a constant slowdown in small ++ big.

scabug · 2013-08-13T20:29:17Z

@non said:
Initial testing shows that x ++ small does indeed get worse the larger x is, while the fold stays constant. However, x ++ x is 3-4x faster than the fold, and x ++ big is 8x faster than the fold (where big.length == 16384).

I'm going to try to get more data and find the crossover points here. Ideally we could dispatch to the right implementation based on the size of LHS/RHS. Thoughts?

scabug · 2013-08-13T21:32:49Z

@SethTisue said:
Well, hmm. I'd be wary of heuristics here as the results could vary a lot depending on the environment. And, the more moving parts, the more potential for error. I'd suggest the simplest, most robust possible fix for 2.10.x, namely the fold, and then leave the door open for later refinement in 2.11, on a separate ticket. But I can see it's debatable.

I would argue that the performance of Vector.++ is only likely to be a performance bottleneck if it is called repeatedly on the same Vector; that typically concatenation proceeds left-to-right; and that under those conditions, big ++ small is the main case and small ++ big is an edge case; it's the former that will dominate performance in typical use. Finally, if anyone is really harmed by the constant-factor slowdown in the small ++ big case, they can always use VectorBuilder themselves and do it their way.

scabug · 2013-08-13T21:52:37Z

@non said (edited on Aug 13, 2013 9:57:05 PM UTC):
I agree that we need to be careful. But do you not agree that lhs.length < rhs.length is probably a fair heuristic?

In practice, the crossover point is closer to 4:1. That is, concatenation is faster even if the LHS is 4x longer than the RHS. But to be pessimistic we could use a 1:1 heuristic, which would err on the side of doing folds.

If you want a big-O justification, I'd argue that if LHS < RHS, then both concat and the fold are O( n ), since there will be O(RHS) fold iterations, and less than O(2RHS) builder iterations. (Of course, this argument works equally well for any constant instead of 2, like 5 in the 4:1 heuristic.)

scabug · 2013-08-13T23:16:57Z

@Sciss said:
What happened to the research by Tiark Rompf / Phil Bagwell with the logarithmic cost concatenation?

scabug · 2013-08-14T01:38:27Z

@SethTisue said:
That's #4442. This is #7725. FOCUS

scabug · 2013-08-14T16:28:20Z

@SethTisue said:
Erik: Yes, having had an additional day to mull this over, I'd say your heuristic seems fair.

scabug · 2013-09-11T14:23:29Z

@Ichoran said:
I find it even worse than 4:1; vector has a huge constant term for append. In my tests it comes out closer to 16:1 (also consistent with other measurements I've made for vector :+). 1:1 is therefore not a good heuristic at all in my view. Really, append is awful: you remake an entire path along a tree with 32 fanout per step in order to add a single element! I'm amazed it can perform as well as it does. There's also the issue that ++ takes a GenTraversableOnce, which means that you cannot in general know how big it's going to be. The "right" way to do this would be to package up the GenTraversableOnce into leaf nodes and then glue the trees together quickly (logarithmically) if you knew how. I'm happy to fix it the "wrong" way to get O(n+m) back to O(m) for small m.

scabug · 2013-09-11T14:33:11Z

@SethTisue said:
"if you knew how", that's the rub — and that's #4442.

scabug · 2013-09-11T14:33:38Z

@SethTisue said:
delighted you're taking this on, Rex.

scabug · 2013-09-11T14:58:50Z

@non said:
Rex, I totally agree. I was just pushing back on the idea we should use append all the time.

I can review your fix, or collaborate on it, or whatever.

scabug · 2013-09-11T15:12:14Z

@Ichoran said:
This outperforms ++ in cases where there is a significant mismatch, but is about 20-30% slower for merging big vectors (on my machine), probably because the JIT compiler doesn't quite optimize as much since it has more work to do. (Specialized on String only for testing, but results should hold for generic case.)

I'll probably go with something like this barring radically superior ideas.

(Note: I'm converting to Vector in the non-re-Traversable case in the hope that we'll get better-than-O(n) concat in Vector someday. Maybe from me, but don't count on it.)

(a: Vector[String], b: GenTraversableOnce[String]) => {
  if (b.isEmpty) a
  else if (!b.isTraversableAgain) {
    val c = b.toVector
    if (c.size < (a.size >> 5)) { var v = a; for (x <- c) v = v :+ x; v }
    else if (a.size < (c.size >> 5)) { var v = c; for (x <- a.reverseIterator) v = x +: v; v }
    else a ++ c
  }
  else b.size match {
    case n if n < 3 || n < (a.size >> 5) => var v = a; for (x <- b) v = v :+ x; v
    case n if a.size < (n >> 5) && b.isInstanceOf[Vector[_]] =>
      var v = b.asInstanceOf[Vector[String]]; for (x <- a.reverseIterator) v = x +: v; v
    case _ => a ++ b
  }
}

scabug · 2013-09-12T01:00:32Z

@Ichoran said:
Fix is contained in pull request scala/scala#2938 -- Erik, want to review?

scabug · 2013-09-12T16:10:56Z

@non said:
Yes!

I've been reviewing it. Your characterization seems right although I'm really trying to tweak the patch to see if I can preserve the previous performance for concatenating two large collections.

If I don't make any progress I'll sign off later tonight.

scabug · 2013-09-23T12:28:23Z

@retronym said:
I've merged the PR as submitted. @erik, we'd still appreciate your review on the choice of thresholds.

scabug · 2013-10-30T14:49:04Z

@pchiusano said:
I'd love if this were backported to 2.10.x, any chance of that happening?

scabug closed this as completed Sep 23, 2013

scabug added the critical label Apr 7, 2017

scabug added this to the 2.11.0-M6 milestone Apr 7, 2017

scabug assigned Ichoran Apr 7, 2017

scabug mentioned this issue Apr 7, 2017

2.11.0-M7 regression in ++(Iterator) #8014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

scabug commented Aug 7, 2013

scabug commented Aug 7, 2013

scabug commented Aug 7, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 14, 2013

scabug commented Aug 14, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 12, 2013

scabug commented Sep 12, 2013

scabug commented Sep 23, 2013

scabug commented Oct 30, 2013

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

Comments

scabug commented Aug 7, 2013

scabug commented Aug 7, 2013

scabug commented Aug 7, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 13, 2013

scabug commented Aug 14, 2013

scabug commented Aug 14, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 11, 2013

scabug commented Sep 12, 2013

scabug commented Sep 12, 2013

scabug commented Sep 23, 2013

scabug commented Oct 30, 2013