Major performance bottleneck in scala.collection.mutable.Builder #9823

scabug · 2016-06-17T21:51:38Z

Background:

While performance profiling a genome sequencing pipeline over a Spark cluster we noticed that the hot methods for the pipeline came from Scala collection APIs; specifically the sizeHint methods in scala.collection.mutable.Builder trait:

  // [https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/mutable/Builder.scala#L77]
  def sizeHint(coll: TraversableLike[_, _]) {
    if (coll.isInstanceOf[collection.IndexedSeqLike[_,_]]) {
      sizeHint(coll.size)
    }
  }
  …

A further analysis of the JITed code shows that a considerably high number of CPU cycles being spent on the instance of check; thereby introducing a bottleneck. The problem also gets exacerbated for single executors with multiple threads than for multiple executors with less number of threads. For example, running the critical stage of our pipeline with 4 executors (9 cores each) is 3x faster than running it with 1 executor (with 36 cores.) Also, important to mention here that the problem gets worse on muti-socket systems with many cores.

Root cause:

The actual root cause of the problem is a slowdown in Java’s instanceof operator. More specifically, a cache trashing of the one-element secondary super type cache that Hotspot uses to cache the last observed secondary super type (interfaces and classes with a class hierarchy depth greater than 7)

// [http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/oops/klass.hpp]
class Klass : public Metadata { 
  …
  // Cache of last observed secondary supertype
  Klass*      _secondary_super_cache;
  
  // Array of all secondary supertypes
  Array<Klass*>* _secondary_supers; 
  …
}

Resolution-1:

Since the root cause of the problem is Hotspot’s implementation of secondary super type check, a permanent solution would require modifying the one-element caching. In our case, we took a rather heavy-handed approach and disabled this one-element cache and let the secondary sub type check relay exclusively on scanning _secondary_supers. This hack, which does not require any modification from Scala, removed the coll.isInstanceOf[collection.IndexedSeqLike[_,_]] bottleneck and speeded up the execution of the critical stage of our pipeline by a factor of 3x (for a Spark configuration with 1 executor and 36 cores – resulting in the same performance we got previously by launching multiple executors with less number of threads).

This resolution requires modifying Hotspot source files and would take more time to upstream to JDK.

Resolution-2:

The second resolution is to completely avoid the instance of check in the sizeHint methods all together and use polymorphism instead. The simplest check we tried is to comment out the instance of check and just let every collection, even collections with “expensive” size method, use the hint. This removed the bottleneck and speeded up the critical stage of our pipeline by 2.8x factor.

The other option we tried, which actually uses polymorphism, is to override the sizeHint methods in collections of type IndexedSeqLike and avoid the instance of check (and a call to the Builder.sizeHint methods). This also resulted in 2.8x performance gain. You may also think of other elegant solutions.

While browsing the Scala source, we observed other instances where you use the same code pattern – if_instance_of_this_do_this_else_do_this – which may end up having the same problem. We recommend you to review those as well.

How to repeat the behavior:

The problem is not limited to our genome workload. For example, if you test out the example code below with multiple threads, it will bottleneck on the same instance of check at scala.collection.mutable.Builder. In our sample ran with 36 threads, applying either one of the fixes gave ~15x speed up in execution time (compared to the default.)

// Running environment is critical; on 2 socket system with 18 cores/socket, HT disabled
// Use -XX:-TieredCompilation
class Task extends Runnable {
  val c1 = Array(-1, 1)
  val c2 = Array(-2, 2)
  def run() {
    for (i <- 1 to 200000000) {
      c1.zip(c2).map(_._1).toSeq.zip(1 until 3).map(_._1)
    }
  }
}

object ScalaInstanceof {
  def main(args: Array[String]) {
    val numThreads = args(0).toInt
    val es = java.util.concurrent.Executors.newFixedThreadPool(numThreads)
    for (i <- 1 to numThreads) es.submit(new Task)
    es.shutdown()
  }
}

If you need any additional information or further details, feel free to contact me [mailto:mulugeta.mammo@intel.com]. Also, let me know if you want me to do a PR on this.

Thanks,
Mulugeta

The text was updated successfully, but these errors were encountered:

scabug · 2016-06-17T21:51:38Z

Imported From: https://issues.scala-lang.org/browse/SI-9823?orig=1
Reporter: Mulugeta Mammo (mulugeta)
Affected Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.1

scabug · 2016-06-18T06:34:26Z

@retronym said:
Thank you for the amazingly detailed analysis! We'll digest this and try to reproduce your results and report back.

scabug · 2016-06-18T08:21:34Z

@retronym said (edited on Jun 18, 2016 8:21:45 AM UTC):
Here's another possible change. Not benchmarked as of yet.

diff --git a/src/library/scala/collection/IndexedSeqLike.scala b/src/library/scala/collection/IndexedSeqLike.scala
index f4bf58f..a9445a4 100644
--- a/src/library/scala/collection/IndexedSeqLike.scala
+++ b/src/library/scala/collection/IndexedSeqLike.scala
@@ -8,6 +8,7 @@
 
 package scala
 package collection
+import scala.collection.generic.CanBuildFrom
 
 /** A template trait for indexed sequences of type `IndexedSeq[A]`.
  *
@@ -86,6 +87,19 @@ trait IndexedSeqLike[+A, +Repr] extends Any with SeqLike[A, Repr] {
   override /*IterableLike*/
   def iterator: Iterator[A] = new Elements(0, length)
 
+
+  override def map[B, That](f: (A) => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
+    def builder = { // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
+      val b = bf(repr)
+      b.sizeHint(size)
+      b
+    }
+    val b = builder
+    for (x <- this) b += f(x)
+    b.result
+  }
+
+
   /* Overridden for efficiency */
   override def toBuffer[A1 >: A]: mutable.Buffer[A1] = {
     val result = new mutable.ArrayBuffer[A1](size)
diff --git a/src/library/scala/collection/TraversableLike.scala b/src/library/scala/collection/TraversableLike.scala
index d914f2e..8449e87 100644
--- a/src/library/scala/collection/TraversableLike.scala
+++ b/src/library/scala/collection/TraversableLike.scala
@@ -225,12 +225,7 @@ trait TraversableLike[+A, +Repr] extends Any
     (that ++ seq)(breakOut)
 
   def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
-    def builder = { // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
-      val b = bf(repr)
-      b.sizeHint(this)
-      b
-    }
-    val b = builder
+    val b = bf(repr)
     for (x <- this) b += f(x)
     b.result
   }

scabug · 2016-06-24T01:39:50Z

Mulugeta Mammo (mulugeta) said:
Yes that works as well. For the above example run it would get the CPI down to 0.56 from 9.97 for the original implementation.

scabug · 2016-07-27T06:15:13Z

Mulugeta Mammo (mulugeta) said (edited on Jul 27, 2016 3:19:39 PM UTC):
For the sample example, use the -XX:-TieredCompilation flag to avoid C1 and use C2 instead. In the genome pipeline case, the flag is enabled but Hotspot compiles a similar piece of code at C2/Level 4. Also, this is not due to code cache exhaustion.

scabug · 2016-08-19T06:09:49Z

@retronym said:
I'm trying to get a fix for this into 2.12.0: scala/scala#5351

scabug · 2016-08-30T06:26:46Z

@adriaanm said:
scala/scala#5364

scabug · 2016-09-01T20:24:54Z

Mulugeta Mammo (mulugeta) said:
So that was basically preferring polymorphism over instanceof (resolution 2). Thanks.

retronym · 2018-06-14T06:49:51Z

The underlying performance problem in the JVM is tracked as JDK-8180450

He-Pin · 2022-09-09T04:48:58Z

optimization in netty:netty/netty#12709

franz1981 · 2022-09-16T09:21:17Z

@He-Pin Another one: netty/netty#12806 more stealthy but could impact many other use cases

This could impact Cassandra as well, mentioned in the comments of the JDK issue, but cannot find any other reference really

franz1981 · 2022-10-12T05:31:20Z

@scabug @retronym

FYI I have built a tool to help diagnosis this issue:
https://github.com/franz1981/type-pollution-agent

Although it's just at bytecode level it guide the developers to spot code pattern and usage that can make the issue to happen (assuming inlining or other JIT optimizations on TypeProfile re monomorphism won't constant fold the instanceof/checkcast ops).
The idea is to use such tool to spot patterns that looks legit and resurrect the JDK issue (and get it fixed).

scabug closed this as completed Aug 30, 2016

scabug added enhancement has PR library performance labels Apr 7, 2017

scabug added this to the 2.12.0-RC1 milestone Apr 7, 2017

scabug assigned retronym Apr 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major performance bottleneck in scala.collection.mutable.Builder #9823

Major performance bottleneck in scala.collection.mutable.Builder #9823

scabug commented Jun 17, 2016 •

edited by retronym

scabug commented Jun 17, 2016

scabug commented Jun 18, 2016

scabug commented Jun 18, 2016

scabug commented Jun 24, 2016

scabug commented Jul 27, 2016

scabug commented Aug 19, 2016

scabug commented Aug 30, 2016

scabug commented Sep 1, 2016

retronym commented Jun 14, 2018

He-Pin commented Sep 9, 2022

franz1981 commented Sep 16, 2022 •

edited

franz1981 commented Oct 12, 2022 •

edited

Major performance bottleneck in scala.collection.mutable.Builder #9823

Major performance bottleneck in scala.collection.mutable.Builder #9823

Comments

scabug commented Jun 17, 2016 • edited by retronym

scabug commented Jun 17, 2016

scabug commented Jun 18, 2016

scabug commented Jun 18, 2016

scabug commented Jun 24, 2016

scabug commented Jul 27, 2016

scabug commented Aug 19, 2016

scabug commented Aug 30, 2016

scabug commented Sep 1, 2016

retronym commented Jun 14, 2018

He-Pin commented Sep 9, 2022

franz1981 commented Sep 16, 2022 • edited

franz1981 commented Oct 12, 2022 • edited

scabug commented Jun 17, 2016 •

edited by retronym

franz1981 commented Sep 16, 2022 •

edited

franz1981 commented Oct 12, 2022 •

edited