Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid duplicating more than once a closure body when inlining a closure #5286

Closed
scabug opened this issue Dec 7, 2011 · 13 comments
Closed

Comments

@scabug
Copy link

scabug commented Dec 7, 2011

immutable.Range.foreach() invokes its argument's apply() twice. When compiling with -optimize, that leads to inlining twice "the loop body" (if it gets inlined at all, for the max size threshold might be surpassed due to the duplicate inlining).

A formulation of immutable.Range.foreach() invoking its argument's apply() just once is:

if (length > 0) {
  val sentinel = last
  var closuVar = start
  var loopCond = true
  while ( loopCond ) {
    f(closuVar)
    if(closuVar == sentinel) loopCond = false
    else closuVar += step
  }
}

There can be other places in the collections library that exhibit a similar pattern.

@scabug
Copy link
Author

scabug commented Dec 7, 2011

Imported From: https://issues.scala-lang.org/browse/SI-5286?orig=1
Reporter: @magarciaEPFL
See #5950

@scabug
Copy link
Author

scabug commented Dec 7, 2011

@ijuma said:
Interesting.

It would be good to have some performance tests for Range.foreach (with functions of varying complexities) to ensure that we are actually improving it. There have been some changes to Range.foreach over the last few releases to improve performance so maybe such tests already exist.

Does anyone know?

@scabug
Copy link
Author

scabug commented Dec 7, 2011

@dcsobral said:
This double call isn't the main issue with inlining foreach. Worse are the IntRefs (and similar classes) created to store variables captured by the closure. If the inlining gets done, you are still left with heap-allocated data that would, otherwise, be used directly from the stack.

But, ignoring that, the present problem is rather difficult to solve while avoiding int overflow problems. I mention that just as an alert to people who endeavor to fix it, so that they do not break something else.

@scabug
Copy link
Author

scabug commented Dec 7, 2011

@magarciaEPFL said:

Daniel,

You're right. What I'm suggesting is an incremental improvement. However, without this incremental improvement, ClosureElimination and DeadCodeElimination won't even try getting rid of the indirection you describe. Those optimizations can only happen provided the apply() callsite on the foreach's closure argument is inlined to start with.

@scabug
Copy link
Author

scabug commented Dec 7, 2011

@dcsobral said:
But how common is the case that code does not get inlined because of this? I worry because a very common and simple and often micro-benchmarked case is suffering precisely because of the issues I mentioned. Consider this code:

class XXS {
  private val array = Array.range(0, 100)                                                                              
  def tst = { var sum = 0; for (i <- 0 until array.length) sum += array(i); sum }       
}

The method tst is compiled like this:

public int tst();
  Code:
   0:	new	#15; //class scala/runtime/IntRef
   3:	dup
   4:	iconst_0
   5:	invokespecial	#19; //Method scala/runtime/IntRef."<init>":(I)V
   8:	astore	5
   10:	new	#21; //class scala/runtime/RichInt
   13:	dup
   14:	iconst_0
   15:	invokespecial	#22; //Method scala/runtime/RichInt."<init>":(I)V
   18:	aload_0
   19:	getfield	#11; //Field XXS$$array:[I
   22:	arraylength
   23:	istore_2
   24:	astore_1
   25:	getstatic	#28; //Field scala/collection/immutable/Range$.MODULE$:Lscala/collection/immutable/Range$;
   28:	aload_1
   29:	invokevirtual	#31; //Method scala/runtime/RichInt.self:()I
   32:	iload_2
   33:	invokevirtual	#35; //Method scala/collection/immutable/Range$.apply:(II)Lscala/collection/immutable/Range;
   36:	dup
   37:	astore	7
   39:	invokevirtual	#40; //Method scala/collection/immutable/Range.length:()I
   42:	iconst_0
   43:	if_icmple	87
   46:	aload	7
   48:	invokevirtual	#43; //Method scala/collection/immutable/Range.last:()I
   51:	istore_3
   52:	aload	7
   54:	invokevirtual	#46; //Method scala/collection/immutable/Range.start:()I
   57:	istore	8
   59:	iload	8
   61:	iload_3
   62:	if_icmpne	93
   65:	iload	8
   67:	istore	4
   69:	aload	5
   71:	aload	5
   73:	getfield	#50; //Field scala/runtime/IntRef.elem:I
   76:	aload_0
   77:	getfield	#11; //Field XXS$$array:[I
   80:	iload	4
   82:	iaload
   83:	iadd
   84:	putfield	#50; //Field scala/runtime/IntRef.elem:I
   87:	aload	5
   89:	getfield	#50; //Field scala/runtime/IntRef.elem:I
   92:	ireturn
   93:	iload	8
   95:	istore	6
   97:	aload	5
   99:	aload	5
   101:	getfield	#50; //Field scala/runtime/IntRef.elem:I
   104:	aload_0
   105:	getfield	#11; //Field XXS$$array:[I
   108:	iload	6
   110:	iaload
   111:	iadd
   112:	putfield	#50; //Field scala/runtime/IntRef.elem:I
   115:	iload	8
   117:	aload	7
   119:	invokevirtual	#53; //Method scala/collection/immutable/Range.step:()I
   122:	iadd
   123:	istore	8
   125:	goto	59

The overheads here are:

0-8: var sum is kept on an IntRef because it would have been used in the closure that got inlined. This field is then accessed 5 times, with at least one of them being redundant (stack -> intref followed by intref -> stack).

10-15, 24, 28, 29: a RichInt is created because of the "until" method, which actually got inlined.

25, 33: a Range is created because of foreach, which is actually inlined. A bunch of fields are read from it. With the current level of optimization, I don't see how this could be avoided, unless the whole Range class could actually be inlined on the stack, if its reference didn't escape the method.

65-92: the duplicated apply that you have mentioned.

Now, eliminating Range seems very hard. Eliminating RichInt seems difficult. But IntRef... it was created because of the Closure that got eliminated. Can't the process that created it be reversed during or after ClosureElimination?

@scabug
Copy link
Author

scabug commented Dec 7, 2011

@paulp said:
As a guy who has tried to make these things faster a number of times, I can tell you there is even more to consider, among which is that IntRef is vastly less expensive than one might suppose. At least it was when I measured. I'm sorry I don't have good records of the things I have tried.

@scabug
Copy link
Author

scabug commented Dec 7, 2011

@jrudolph said:
Why don't we provide a most simplistic version for the common use cases like

case class MyRange(from: Int, to: Int) {
  def foreach(f: Int => Unit): Unit = {
    var i = from
    while (i < to) {
      f(i)
      i+=1
    }
  }
}
 def timeMyRange(reps:Int) = {

   var i=0
   var sum = 0L
   while(i<reps) {
     for(j<-MyRange(0, size))
       sum+=j
     i+=1
   }
   sum
 }

Rerunning the caliper benchmark from the mailing list (on my machine, without -optimize):

[info]  0% Scenario{vm=java, trial=0, benchmark=ForLoop} 19535.70 ns; σ=78.04 ns @ 3 trials
[info] 33% Scenario{vm=java, trial=0, benchmark=MyRange} 5091.66 ns; σ=19.43 ns @ 3 trials
[info] 67% Scenario{vm=java, trial=0, benchmark=WhileLoop} 5099.48 ns; σ=93.54 ns @ 10 trials
[info] 
[info] benchmark    us linear runtime
[info]   ForLoop 19.54 ==============================
[info]   MyRange  5.09 =======
[info] WhileLoop  5.10 =======
[info] 
[info] vm: java
[info] trial: 0

I don't think -optimize will win the fight against hotspot when it comes to inlining. Without looking at the output of -XX:+PrintAssembly, you will never be sure what is possible on HotSpot.

For example for the for loop example, here's what hotspot produces (but only after 10 secs and having tried other optimizations first):


 # {method} 'timeForLoop' '(I)J' in 'Test$'
  0x00007f8d79168100: callq  0x00007f8d7d91c1d0  ;   {runtime_call}
  0x00007f8d79168105: nopw   0x0(%rax,%rax,1)
  0x00007f8d79168110: mov    %eax,-0x6000(%rsp)
  0x00007f8d79168117: push   %rbp
  0x00007f8d79168118: sub    $0x40,%rsp
  0x00007f8d7916811c: rex mov    0x8(%rsi),%ebp
  0x00007f8d79168120: mov    (%rsi),%rbx
  0x00007f8d79168123: mov    0x10(%rsi),%r13d
  0x00007f8d79168127: mov    %rsi,%rdi
  0x00007f8d7916812a: mov    $0x7f8d7d99fdc0,%r10
  0x00007f8d79168134: callq  *%r10
  0x00007f8d79168137: test   %rbx,%rbx
  0x00007f8d7916813a: je     0x00007f8d79168313
  0x00007f8d79168140: mov    0x8(%rbx),%r10d
  0x00007f8d79168144: cmp    $0x82540d58,%r10d  ;   {oop('scala/runtime/LongRef')}
  0x00007f8d7916814b: jne    0x00007f8d7916831a  ;*iload_2
                                                ; - Test$::timeForLoop@11 (line 17)
  0x00007f8d79168151: cmp    %r13d,%ebp
  0x00007f8d79168154: jge    0x00007f8d791682fb  ;*if_icmpge
                                                ; - Test$::timeForLoop@13 (line 17)
  0x00007f8d7916815a: mov    %rbx,%r10          ;*putfield sum$2
                                                ; - Test$$anonfun$timeForLoop$1::<init>@2 (line 18)
                                                ; - Test$::timeForLoop@33 (line 18)
  0x00007f8d7916815d: jmp    0x00007f8d791681a6
  0x00007f8d7916815f: nop    
  0x00007f8d79168160: mov    %edx,%r11d         ;*aload_1
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@23 (line 75)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168163: mov    %r11d,%edx
  0x00007f8d79168166: add    0x14(%r10),%edx    ;*iadd
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@35 (line 76)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d7916816a: movslq %r11d,%r11
  0x00007f8d7916816d: add    %r11,%rcx          ;*ladd
                                                ; - Test$$anonfun$timeForLoop$1::apply$mcVI$sp@13 (line 19)
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@25 (line 75)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168170: mov    %rcx,0x10(%r9)     ; OopMap{r10=Oop r9=NarrowOop rbp=Oop [8]=Oop [16]=NarrowOop off=116}
                                                ;*goto
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@37 (line 76)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168174: test   %eax,0x5691e86(%rip)        # 0x00007f8d7e7fa000
                                                ;*goto
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@37 (line 76)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ;   {poll}
  0x00007f8d7916817a: cmp    %eax,%edx
  0x00007f8d7916817c: jne    0x00007f8d79168160  ;*if_icmpeq
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@20 (line 74)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d7916817e: mov    %rbp,%rsi
  0x00007f8d79168181: xchg   %ax,%ax
  0x00007f8d79168183: callq  0x00007f8d7912e820  ; OopMap{[8]=Oop [16]=NarrowOop off=136}
                                                ;*invokeinterface apply$mcVI$sp
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@42 (line 78)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ;   {optimized virtual_call}
  0x00007f8d79168188: mov    (%rsp),%ebp
  0x00007f8d7916818b: inc    %ebp               ;*iadd
                                                ; - Test$::timeForLoop@41 (line 20)
  0x00007f8d7916818d: cmp    0x4(%rsp),%ebp
  0x00007f8d79168191: jge    0x00007f8d791682f6
  0x00007f8d79168197: mov    0x8(%rsp),%rbx
  0x00007f8d7916819c: mov    0x4(%rsp),%r13d
  0x00007f8d791681a1: mov    0x10(%rsp),%r10d   ;*synchronization entry
                                                ; - scala.LowPriorityImplicits::intWrapper@-1 (line 32)
                                                ; - Test$::timeForLoop@20 (line 18)
  0x00007f8d791681a6: mov    0x60(%r15),%rax
  0x00007f8d791681aa: mov    %rax,%r11
  0x00007f8d791681ad: add    $0x28,%r11
  0x00007f8d791681b1: cmp    0x70(%r15),%r11
  0x00007f8d791681b5: jae    0x00007f8d79168292
  0x00007f8d791681bb: mov    %r11,0x60(%r15)
  0x00007f8d791681bf: prefetchnta 0xc0(%r11)
  0x00007f8d791681c7: mov    $0x825713c0,%r11d  ;   {oop('scala/collection/immutable/Range')}
  0x00007f8d791681cd: mov    0xb0(%r11),%r11
  0x00007f8d791681d4: mov    %r11,(%rax)
  0x00007f8d791681d7: movl   $0x825713c0,0x8(%rax)  ;   {oop('scala/collection/immutable/Range')}
  0x00007f8d791681de: mov    %r12d,0xc(%rax)
  0x00007f8d791681e2: mov    %r12,0x18(%rax)
  0x00007f8d791681e6: mov    %r12,0x20(%rax)    ;*new  ; - scala.collection.immutable.Range$::apply@0 (line 267)
                                                ; - scala.runtime.RichInt::until@8 (line 21)
                                                ; - Test$::timeForLoop@25 (line 18)
  0x00007f8d791681ea: mov    $0x1000f4240,%r11
  0x00007f8d791681f4: mov    %r11,0x10(%rax)
  0x00007f8d791681f8: mov    %rax,%r8           ;*synchronization entry
                                                ; - scala.collection.immutable.Range$::apply@-1 (line 267)
                                                ; - scala.runtime.RichInt::until@8 (line 21)
                                                ; - Test$::timeForLoop@25 (line 18)
  0x00007f8d791681fb: mov    0x60(%r15),%rax
  0x00007f8d791681ff: mov    %rax,%r11
  0x00007f8d79168202: add    $0x10,%r11
  0x00007f8d79168206: cmp    0x70(%r15),%r11
  0x00007f8d7916820a: jae    0x00007f8d791682c3
  0x00007f8d79168210: mov    %r11,0x60(%r15)
  0x00007f8d79168214: prefetchnta 0xc0(%r11)
  0x00007f8d7916821c: mov    $0x8257a1c8,%r9d   ;   {oop('Test$$anonfun$timeForLoop$1')}
  0x00007f8d79168222: mov    0xb0(%r9),%r11
  0x00007f8d79168229: mov    %r11,(%rax)
  0x00007f8d7916822c: movl   $0x8257a1c8,0x8(%rax)  ;   {oop('Test$$anonfun$timeForLoop$1')}
  0x00007f8d79168233: mov    %ebp,(%rsp)
  0x00007f8d79168236: mov    %rbx,0x8(%rsp)
  0x00007f8d7916823b: mov    %r13d,0x4(%rsp)
  0x00007f8d79168240: mov    %r8,0x18(%rsp)     ;*new  ; - Test$::timeForLoop@28 (line 18)
  0x00007f8d79168245: mov    %r10d,0xc(%rax)    ;*putfield sum$2
                                                ; - Test$$anonfun$timeForLoop$1::<init>@2 (line 18)
                                                ; - Test$::timeForLoop@33 (line 18)
  0x00007f8d79168249: mov    %r10d,0x10(%rsp)
  0x00007f8d7916824e: mov    %rax,%rbp          ;*synchronization entry
                                                ; - Test$::timeForLoop@-1 (line 15)
  0x00007f8d79168251: mov    0x18(%rsp),%rsi
  0x00007f8d79168256: nop    
  0x00007f8d79168257: callq  0x00007f8d7912e820  ; OopMap{rbp=Oop [8]=Oop [16]=NarrowOop [24]=Oop off=348}
                                                ;*invokevirtual length
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@1 (line 71)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ;   {optimized virtual_call}
  0x00007f8d7916825c: test   %eax,%eax
  0x00007f8d7916825e: jle    0x00007f8d79168188  ;*if_icmple
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@5 (line 71)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168264: mov    0x18(%rsp),%rsi
  0x00007f8d79168269: xchg   %ax,%ax
  0x00007f8d7916826b: callq  0x00007f8d7912e820  ; OopMap{rbp=Oop [8]=Oop [16]=NarrowOop [24]=Oop off=368}
                                                ;*invokevirtual last
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@9 (line 72)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ;   {optimized virtual_call}
  0x00007f8d79168270: mov    0x18(%rsp),%r10
  0x00007f8d79168275: mov    0xc(%r10),%r11d    ;*getfield start
                                                ; - scala.collection.immutable.Range::start@1 (line 43)
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@14 (line 73)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168279: cmp    %eax,%r11d
  0x00007f8d7916827c: je     0x00007f8d7916830b  ;*if_icmpeq
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@20 (line 74)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168282: mov    %rbp,%r8
  0x00007f8d79168285: mov    0xc(%rbp),%r9d     ;*getfield sum$2
                                                ; - Test$$anonfun$timeForLoop$1::apply$mcVI$sp@1 (line 19)
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@25 (line 75)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168289: mov    0x10(%r9),%rcx     ;*getfield elem
                                                ; - Test$$anonfun$timeForLoop$1::apply$mcVI$sp@8 (line 19)
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@25 (line 75)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ; implicit exception: dispatches to 0x00007f8d79168335
  0x00007f8d7916828d: jmpq   0x00007f8d79168163
  0x00007f8d79168292: mov    %r10d,0xc(%rsp)
  0x00007f8d79168297: mov    %r13d,0x8(%rsp)
  0x00007f8d7916829c: mov    %rbx,(%rsp)
  0x00007f8d791682a0: mov    $0x825713c0,%rsi   ;   {oop('scala/collection/immutable/Range')}
  0x00007f8d791682aa: nop    
  0x00007f8d791682ab: callq  0x00007f8d79154da0  ; OopMap{[0]=Oop [12]=NarrowOop off=432}
                                                ;*new  ; - scala.collection.immutable.Range$::apply@0 (line 267)
                                                ; - scala.runtime.RichInt::until@8 (line 21)
                                                ; - Test$::timeForLoop@25 (line 18)
                                                ;   {runtime_call}
  0x00007f8d791682b0: mov    (%rsp),%rbx
  0x00007f8d791682b4: mov    0x8(%rsp),%r13d
  0x00007f8d791682b9: mov    0xc(%rsp),%r10d
  0x00007f8d791682be: jmpq   0x00007f8d791681ea
  0x00007f8d791682c3: mov    %r8,0x18(%rsp)
  0x00007f8d791682c8: mov    %r10d,0x10(%rsp)
  0x00007f8d791682cd: mov    %r13d,0x4(%rsp)
  0x00007f8d791682d2: mov    %rbx,0x8(%rsp)
  0x00007f8d791682d7: mov    %ebp,(%rsp)
  0x00007f8d791682da: mov    $0x8257a1c8,%rsi   ;   {oop('Test$$anonfun$timeForLoop$1')}
  0x00007f8d791682e4: xchg   %ax,%ax
  0x00007f8d791682e7: callq  0x00007f8d79154da0  ; OopMap{[8]=Oop [16]=NarrowOop [24]=Oop off=492}
                                                ;*new  ; - Test$::timeForLoop@28 (line 18)
                                                ;   {runtime_call}
  0x00007f8d791682ec: mov    0x10(%rsp),%r10d
  0x00007f8d791682f1: jmpq   0x00007f8d79168245
  0x00007f8d791682f6: mov    0x8(%rsp),%rbx     ;*if_icmpge
                                                ; - Test$::timeForLoop@13 (line 17)
  0x00007f8d791682fb: mov    0x10(%rbx),%rax    ; implicit exception: dispatches to 0x00007f8d79168382
  0x00007f8d791682ff: add    $0x40,%rsp
  0x00007f8d79168303: pop    %rbp
  0x00007f8d79168304: test   %eax,0x5691cf6(%rip)        # 0x00007f8d7e7fa000
                                                ;   {poll_return}
  0x00007f8d7916830a: retq   
  0x00007f8d7916830b: mov    %r11d,%edx
  0x00007f8d7916830e: jmpq   0x00007f8d7916817e
  0x00007f8d79168313: xor    %ebx,%ebx
  0x00007f8d79168315: jmpq   0x00007f8d79168151
  0x00007f8d7916831a: mov    $0xffffffad,%esi
  0x00007f8d7916831f: mov    %r13d,(%rsp)
  0x00007f8d79168323: mov    %rbx,0x8(%rsp)
  0x00007f8d79168328: xchg   %ax,%ax
  0x00007f8d7916832b: callq  0x00007f8d7912dea0  ; OopMap{[8]=Oop off=560}
                                                ;*iload_2
                                                ; - Test$::timeForLoop@11 (line 17)
                                                ;   {runtime_call}
  0x00007f8d79168330: callq  0x00007f8d7d91c1d0  ;*iload_2
                                                ; - Test$::timeForLoop@11 (line 17)
                                                ;   {runtime_call}
  0x00007f8d79168335: mov    $0xffffff86,%esi
  0x00007f8d7916833a: mov    0x4(%rsp),%ebp
  0x00007f8d7916833e: pushq  0x18(%rsp)
  0x00007f8d79168342: popq   0x10(%rsp)
  0x00007f8d79168346: mov    %r8,0x18(%rsp)
  0x00007f8d7916834b: mov    %eax,0x4(%rsp)
  0x00007f8d7916834f: mov    %r11d,0x20(%rsp)
  0x00007f8d79168354: xchg   %ax,%ax
  0x00007f8d79168357: callq  0x00007f8d7912dea0  ; OopMap{[8]=Oop [16]=Oop [24]=Oop off=604}
                                                ;*aload_1
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@23 (line 75)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ;   {runtime_call}
  0x00007f8d7916835c: callq  0x00007f8d7d91c1d0  ;*new
                                                ; - scala.collection.immutable.Range$::apply@0 (line 267)
                                                ; - scala.runtime.RichInt::until@8 (line 21)
                                                ; - Test$::timeForLoop@25 (line 18)
                                                ;   {runtime_call}
  0x00007f8d79168361: mov    %rax,%rsi
  0x00007f8d79168364: jmp    0x00007f8d79168378  ;*new
                                                ; - Test$::timeForLoop@28 (line 18)
  0x00007f8d79168366: mov    %rax,%rsi
  0x00007f8d79168369: jmp    0x00007f8d79168378  ;*invokevirtual length
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@1 (line 71)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d7916836b: mov    %rax,%rsi
  0x00007f8d7916836e: jmp    0x00007f8d79168378  ;*invokevirtual last
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@9 (line 72)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168370: mov    %rax,%rsi
  0x00007f8d79168373: jmp    0x00007f8d79168378  ;*invokeinterface apply$mcVI$sp
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@42 (line 78)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168375: mov    %rax,%rsi          ;*invokevirtual length
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@1 (line 71)
                                                ; - Test$::timeForLoop@36 (line 18)
  0x00007f8d79168378: add    $0x40,%rsp
  0x00007f8d7916837c: pop    %rbp
  0x00007f8d7916837d: jmpq   0x00007f8d79154660  ;*getfield elem
                                                ; - Test$::timeForLoop@47 (line 22)
                                                ;   {runtime_call}
  0x00007f8d79168382: mov    $0xfffffff6,%esi
  0x00007f8d79168387: callq  0x00007f8d7912dea0  ; OopMap{off=652}
                                                ;*getfield elem
                                                ; - Test$::timeForLoop@47 (line 22)
                                                ;   {runtime_call}
  0x00007f8d7916838c: callq  0x00007f8d7d91c1d0  ;*aload_1
                                                ; - scala.collection.immutable.Range::foreach$mVc$sp@23 (line 75)
                                                ; - Test$::timeForLoop@36 (line 18)
                                                ;   {runtime_call}

There's so much stuff in there, mainly that there are still calls remaining to last, length, etc. What it boils down to IMO is that Range is much too complicated for the common cases.

@scabug
Copy link
Author

scabug commented Dec 8, 2011

@dcsobral said:
Such optimizations don't offer much over a while loop, and are useless the instant you need something slightly different.

I have an alternative version that removes the second apply call. I'll submit it tomorrow after benchmarking it, unless someone submits something else first.

@scabug
Copy link
Author

scabug commented Dec 8, 2011

@dcsobral said:
Submitted pull request: scala/scala#48

On the other hand, I notice that while loops have their bodies placed after the rest of the body of the method, at least on these small tests. This seems suboptimal to me.

@scabug
Copy link
Author

scabug commented Mar 15, 2013

@adriaanm said:
Un-assigning to foster work stealing, as announced in https://groups.google.com/forum/?fromgroups=#!topic/scala-internals/o8WG4plpNkw

@scabug
Copy link
Author

scabug commented Jul 10, 2013

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

@scabug
Copy link
Author

scabug commented Jul 11, 2013

@magarciaEPFL said:
The code duplication described in this ticket (applying to the old optimizer) is nowadays avoided in Range.foreach, because it was refactored to avoid that quirk of the old optimizer. However the problem remains in general. By design, the new optimizer http://magarciaepfl.github.io/scala/ doesn't duplicate more than once a closure body when inlining a closure.

@SethTisue
Copy link
Member

closing old back end tickets which are likely no longer applicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants