Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

Closed
scabug opened this issue Aug 7, 2013 · 20 comments
Closed

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

scabug opened this issue Aug 7, 2013 · 20 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Aug 7, 2013

repeated ++ of a single-item collection onto a Vector is outrageously slow compared to repeatedly adding one item with + :

Welcome to Scala version 2.10.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0-ea).

scala> import System.{currentTimeMillis => millis}
import System.{currentTimeMillis=>millis}

scala> def time[T](f: => T): Long = { val start = millis; f; millis - start }
time: [T](f: => T)Long

scala> def fast(n: Int) = (1 to n).foldLeft(Vector[Int]())(_ :+ _)
fast: (n: Int)scala.collection.immutable.Vector[Int]

scala> def slow(n: Int) = (1 to n).foldLeft(Vector[Int]())((acc, a) => acc ++ List(a))
slow: (n: Int)scala.collection.immutable.Vector[Int]

scala> Stream.iterate(1)(_ * 10).map(n => time(fast(n))).take(6).force
res0: scala.collection.immutable.Stream[Long] = Stream(1, 0, 1, 4, 48, 77)

scala> Stream.iterate(1)(_ * 10).map(n => time(slow(n))).take(6).force
res1: scala.collection.immutable.Stream[Long] = Stream(1, 0, 1, 30, 974, 88617)

Paul Chiusano pointed this out over at #4442, where he comments: "Since Vector has a constant time snoc operation, it seems like the default implementation of ++ should be repeated calls to :+, which doesn't require any internal changes to Vector."

@scabug
Copy link
Author

scabug commented Aug 7, 2013

Imported From: https://issues.scala-lang.org/browse/SI-7725?orig=1
Reporter: @SethTisue
Affected Versions: 2.10.2
See #4442

@scabug
Copy link
Author

scabug commented Aug 7, 2013

@SethTisue said:
This seems like a critical bug to me, and I think a fix would merit backporting to 2.10.x.

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@adriaanm said:
Agreed. Let's start by fixing it in 2.11. Backport should be easy.

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@non said:
I wonder, is there a length above which the current ++ implementation makes sense? If not then just repeating calls to :+ makes sense. I could imagine that for concatenating two very large vectors the current implementation is faster though.

This should be easy to test... I'll do it myself if I have time to get to it.

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@SethTisue said (edited on Aug 13, 2013 7:09:06 PM UTC):
If the first vector is very short and the second vector is very long, both old code and proposed new code will run in O(n) time, but yes, the new code will have worse constant factors:

import System.{currentTimeMillis => millis}
def time[T](f: => T): Long = { val start = millis; f; millis - start }
def repeat[T](n: Int)(body: => T): Unit = for(_ <- 0 until n) body
def fix[T](v1: Vector[T], v2: Vector[T]): Vector[T] = v2.foldLeft(v1)(_ :+ _)
val big = Vector.range(1,100000)
val small = Vector(1)
def test() {
  println("before fix")
  println(time(repeat(100)(big ++ big)))
  println(time(repeat(100)(big ++ small)))
  println(time(repeat(100)(small ++ big)))
  println("after fix")
  println(time(repeat(100)(fix(big, big))))
  println(time(repeat(100)(fix(big, small))))
  println(time(repeat(100)(fix(small, big))))
}

Welcome to Scala version 2.10.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25)

...
scala> test()
before fix
529
213
217
after fix
586
0
560

But I don't think this should bother us more than slightly. I think getting asymptotic speedup on big ++ small is clearly worth a constant slowdown in small ++ big.

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@non said:
Initial testing shows that x ++ small does indeed get worse the larger x is, while the fold stays constant. However, x ++ x is 3-4x faster than the fold, and x ++ big is 8x faster than the fold (where big.length == 16384).

I'm going to try to get more data and find the crossover points here. Ideally we could dispatch to the right implementation based on the size of LHS/RHS. Thoughts?

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@SethTisue said:
Well, hmm. I'd be wary of heuristics here as the results could vary a lot depending on the environment. And, the more moving parts, the more potential for error. I'd suggest the simplest, most robust possible fix for 2.10.x, namely the fold, and then leave the door open for later refinement in 2.11, on a separate ticket. But I can see it's debatable.

I would argue that the performance of Vector.++ is only likely to be a performance bottleneck if it is called repeatedly on the same Vector; that typically concatenation proceeds left-to-right; and that under those conditions, big ++ small is the main case and small ++ big is an edge case; it's the former that will dominate performance in typical use. Finally, if anyone is really harmed by the constant-factor slowdown in the small ++ big case, they can always use VectorBuilder themselves and do it their way.

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@non said (edited on Aug 13, 2013 9:57:05 PM UTC):
I agree that we need to be careful. But do you not agree that lhs.length < rhs.length is probably a fair heuristic?

In practice, the crossover point is closer to 4:1. That is, concatenation is faster even if the LHS is 4x longer than the RHS. But to be pessimistic we could use a 1:1 heuristic, which would err on the side of doing folds.

If you want a big-O justification, I'd argue that if LHS < RHS, then both concat and the fold are O( n ), since there will be O(RHS) fold iterations, and less than O(2RHS) builder iterations. (Of course, this argument works equally well for any constant instead of 2, like 5 in the 4:1 heuristic.)

@scabug
Copy link
Author

scabug commented Aug 13, 2013

@Sciss said:
What happened to the research by Tiark Rompf / Phil Bagwell with the logarithmic cost concatenation?

@scabug
Copy link
Author

scabug commented Aug 14, 2013

@SethTisue said:
That's #4442. This is #7725. FOCUS

@scabug
Copy link
Author

scabug commented Aug 14, 2013

@SethTisue said:
Erik: Yes, having had an additional day to mull this over, I'd say your heuristic seems fair.

@scabug
Copy link
Author

scabug commented Sep 11, 2013

@Ichoran said:
I find it even worse than 4:1; vector has a huge constant term for append. In my tests it comes out closer to 16:1 (also consistent with other measurements I've made for vector :+). 1:1 is therefore not a good heuristic at all in my view. Really, append is awful: you remake an entire path along a tree with 32 fanout per step in order to add a single element! I'm amazed it can perform as well as it does. There's also the issue that ++ takes a GenTraversableOnce, which means that you cannot in general know how big it's going to be. The "right" way to do this would be to package up the GenTraversableOnce into leaf nodes and then glue the trees together quickly (logarithmically) if you knew how. I'm happy to fix it the "wrong" way to get O(n+m) back to O(m) for small m.

@scabug
Copy link
Author

scabug commented Sep 11, 2013

@SethTisue said:
"if you knew how", that's the rub — and that's #4442.

@scabug
Copy link
Author

scabug commented Sep 11, 2013

@SethTisue said:
delighted you're taking this on, Rex.

@scabug
Copy link
Author

scabug commented Sep 11, 2013

@non said:
Rex, I totally agree. I was just pushing back on the idea we should use append all the time.

I can review your fix, or collaborate on it, or whatever.

@scabug
Copy link
Author

scabug commented Sep 11, 2013

@Ichoran said:
This outperforms ++ in cases where there is a significant mismatch, but is about 20-30% slower for merging big vectors (on my machine), probably because the JIT compiler doesn't quite optimize as much since it has more work to do. (Specialized on String only for testing, but results should hold for generic case.)

I'll probably go with something like this barring radically superior ideas.

(Note: I'm converting to Vector in the non-re-Traversable case in the hope that we'll get better-than-O(n) concat in Vector someday. Maybe from me, but don't count on it.)

(a: Vector[String], b: GenTraversableOnce[String]) => {
  if (b.isEmpty) a
  else if (!b.isTraversableAgain) {
    val c = b.toVector
    if (c.size < (a.size >> 5)) { var v = a; for (x <- c) v = v :+ x; v }
    else if (a.size < (c.size >> 5)) { var v = c; for (x <- a.reverseIterator) v = x +: v; v }
    else a ++ c
  }
  else b.size match {
    case n if n < 3 || n < (a.size >> 5) => var v = a; for (x <- b) v = v :+ x; v
    case n if a.size < (n >> 5) && b.isInstanceOf[Vector[_]] =>
      var v = b.asInstanceOf[Vector[String]]; for (x <- a.reverseIterator) v = x +: v; v
    case _ => a ++ b
  }
}

@scabug
Copy link
Author

scabug commented Sep 12, 2013

@Ichoran said:
Fix is contained in pull request scala/scala#2938 -- Erik, want to review?

@scabug
Copy link
Author

scabug commented Sep 12, 2013

@non said:
Yes!

I've been reviewing it. Your characterization seems right although I'm really trying to tweak the patch to see if I can preserve the previous performance for concatenating two large collections.

If I don't make any progress I'll sign off later tonight.

@scabug
Copy link
Author

scabug commented Sep 23, 2013

@retronym said:
I've merged the PR as submitted. @erik, we'd still appreciate your review on the choice of thresholds.

@scabug scabug closed this as completed Sep 23, 2013
@scabug
Copy link
Author

scabug commented Oct 30, 2013

@pchiusano said:
I'd love if this were backported to 2.10.x, any chance of that happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants