Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For loops with -optimize have much worse performance than while loops #4633

Closed
scabug opened this issue May 22, 2011 · 13 comments
Closed

For loops with -optimize have much worse performance than while loops #4633

scabug opened this issue May 22, 2011 · 13 comments

Comments

@scabug
Copy link

scabug commented May 22, 2011

According to Tiark Rompf, "With -optimize, scalac should compile simple for comprehensions to the equivalent of a while loop. If it does not it's a bug (either in the library or in the compiler) and it needs fixing. So if there is indeed a bug, gather solid performance numbers [and] submit a ticket"

Here are performance numbers for loops nested 1, 3, 5, or a variable number of levels deep; in every case, the innermost operation occurs one billion times. The leftmost number is a checksum (the first three should be the same; the fourth is different):

// These are while loops
-1243309312 While1 Elapsed: 0.314 s
-1243309312 While3 Elapsed: 0.301 s
-1243309312 While5 Elapsed: 0.316 s
-17609343 While? Elapsed: 1.151 s

// These are for loops with ranges
-1243309312 Limit1 Elapsed: 0.923 s
-1243309312 Limit3 Elapsed: 0.885 s
-1243309312 Limit5 Elapsed: 2.527 s
-17609343 Limit? Elapsed: 3.308 s

// These use a C-for-like-method that takes an anonymous function
// Note that the JVM gets confused about how to optimize repeated uses
-1243309312 Cfor1 Elapsed: 9.794 s
-1243309312 Cfor3 Elapsed: 0.324 s
-1243309312 Cfor5 Elapsed: 0.450 s
-17609343 Cfor?? Elapsed: 5.662 s

Code is attached.

@scabug
Copy link
Author

scabug commented May 22, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4633?orig=1
Reporter: @Ichoran
Affected Versions: 2.8.1, 2.9.0
Attachments:

  • Cruel.scala (created on May 22, 2011 1:53:34 PM UTC, 4273 bytes)
  • ForWhile.scala (created on May 23, 2011 7:38:38 PM UTC, 1533 bytes)

@scabug
Copy link
Author

scabug commented May 23, 2011

Matthew Pocock (drdozer) said:
The simple case of a for loop over a range is so common and so slow that it deserves optimizing, ideally in a standard scalac run, but as a last resort, when -optimize is turned on. The current situation makes scala look bad.

@scabug
Copy link
Author

scabug commented May 23, 2011

@Ichoran said:
Attached is another benchmark demonstrating the difference, including array access also.

With -optimise, I get timings:

while range 1 : -536870912 took 0.339 s
while range 2 : -536870912 took 0.328 s
while array 1024: -536870912 took 0.497 s
for range 1 : -536870912 took 0.658 s
for range 2 : -536870912 took 1.335 s
for array 1024: -536870912 took 4.533 s

demonstrating a 2x-9x slowdown using for instead of while. Without -optimise:

while range 1 : -536870912 took 0.337 s
while range 2 : -536870912 took 0.321 s
while array 1024: -536870912 took 0.494 s
for range 1 : -536870912 took 0.997 s
for range 2 : -536870912 took 1.061 s
for array 1024: -536870912 took 3.375 s

gives a 3x-7x slowdown.

(These timings are 2.9.0, again Ubuntu 10.10 with Sun Java 1.6.0_24-b07; build 1.7.0-ea-b143 gives similar results.)

Incidentally, JRockit (Oracle JRockit(R) (build R28.0.1-21-133393-1.6.0_20-20100512-2126-linux-x86_64, compiled mode) makes it look like the for loop is often as good as the while loop:

while range 1 : -536870912 took 0.928 s
while range 2 : -536870912 took 0.903 s
while array 1024: -536870912 took 1.199 s
for range 1 : -536870912 took 0.898 s
for range 2 : -536870912 took 2.132 s
for array 1024: -536870912 took 0.953 s

but of course the dirty little secret here is that even the while loops are horribly slow--everything is 3x worse than it needs to be.

@scabug
Copy link
Author

scabug commented Jun 2, 2011

@ijuma said:
An option is to simply include the part of the ScalaCL plugin that tranforms certain operations to while loops with the Scala distribution. This way the compiler doesn't become more complex and the fear that the plugin would not evolve together with Scala would go away.

@scabug
Copy link
Author

scabug commented Jun 2, 2011

@ijuma said:
I forgot the link: http://code.google.com/p/scalacl/wiki/ScalaCLPlugin

@scabug
Copy link
Author

scabug commented Mar 7, 2012

Ryan Schmitt (ryanls) said:
I agree with Ismael. The ScalaCL plugin is a useful start is a useful modification to the compiler.

@scabug
Copy link
Author

scabug commented Mar 7, 2012

Ryan Schmitt (ryanls) said:
Accidental click

@scabug
Copy link
Author

scabug commented Mar 7, 2012

Ryan Schmitt (ryanls) said:
fixed the mistake

@scabug
Copy link
Author

scabug commented Jul 10, 2013

@adriaanm said:
Unassigning as milestone deadline was reached.

@scabug
Copy link
Author

scabug commented May 12, 2016

@SethTisue said:
has anyone benchmarked this in 2.12.0-M4?

@scabug
Copy link
Author

scabug commented May 12, 2016

@rklaehn said:
for range 1 and for range 2 are pretty good, array range 1024 still sucks.

https://github.com/rklaehn/si4633


[info] Running ForWhile
version 2.11.6
while range 1 : -536870912 took 0.369 s
while range 2 : -536870912 took 0.383 s
while array 1024: -536870912 took 0.416 s
for range 1 : -536870912 took 0.756 s
for range 2 : -536870912 took 2.359 s
for array 1024: -536870912 took 4.968 s

while range 1 : -536870912 took 0.387 s
while range 2 : -536870912 took 0.403 s
while array 1024: -536870912 took 0.402 s
for range 1 : -536870912 took 0.736 s
for range 2 : -536870912 took 2.254 s
for array 1024: -536870912 took 4.838 s

while range 1 : -536870912 took 0.379 s
while range 2 : -536870912 took 0.381 s
while array 1024: -536870912 took 0.364 s
for range 1 : -536870912 took 2.571 s
for range 2 : -536870912 took 0.364 s
for array 1024: -536870912 took 3.970 s

while range 1 : -536870912 took 0.355 s
while range 2 : -536870912 took 0.371 s
while array 1024: -536870912 took 0.364 s
for range 1 : -536870912 took 0.371 s
for range 2 : -536870912 took 0.365 s
for array 1024: -536870912 took 3.687 s


[info] Running ForWhile
version 2.12.0-M4
while range 1 : -536870912 took 0.372 s
while range 2 : -536870912 took 0.381 s
while array 1024: -536870912 took 0.369 s
for range 1 : -536870912 took 0.554 s
for range 2 : -536870912 took 0.386 s
for array 1024: -536870912 took 4.101 s

while range 1 : -536870912 took 0.369 s
while range 2 : -536870912 took 0.371 s
while array 1024: -536870912 took 0.362 s
for range 1 : -536870912 took 0.591 s
for range 2 : -536870912 took 0.379 s
for array 1024: -536870912 took 4.073 s

while range 1 : -536870912 took 0.369 s
while range 2 : -536870912 took 0.361 s
while array 1024: -536870912 took 0.364 s
for range 1 : -536870912 took 0.369 s
for range 2 : -536870912 took 0.378 s
for array 1024: -536870912 took 3.397 s

while range 1 : -536870912 took 0.371 s
while range 2 : -536870912 took 0.386 s
while array 1024: -536870912 took 0.362 s
for range 1 : -536870912 took 0.372 s
for range 2 : -536870912 took 0.376 s
for array 1024: -536870912 took 3.457 s

@scabug scabug added this to the Backlog milestone Apr 7, 2017
@SethTisue
Copy link
Member

why is this a separate ticket from #1338? (no reason?)

@som-snytt
Copy link

This ticket was for the Tiark Challenge, you had to create a ticket with solid numbers in order to get a t-shirt. On the back it said, "Rompf in the Swampf".

Previous to that was the Paulp-off.

The conversations (including the links here) follow similar twists and turns from quick optimizations to plugins to areas of research.

With -optimize or equivalent:

[info] Running ForWhile 
version 2.12.8
[snip]
while range 1   : -536870912 took  0.547 s
while range 2   : -536870912 took  0.549 s
while array 1024: -536870912 took  0.593 s
for   range 1   : -536870912 took  0.547 s
for   range 2   : -536870912 took  0.553 s
for   array 1024: -536870912 took  4.422 s

[info] Running ForWhile 
version 2.13.0
[snip]
while range 1   : -536870912 took  0.546 s
while range 2   : -536870912 took  0.548 s
while array 1024: -536870912 took  0.591 s
for   range 1   : -536870912 took  0.546 s
for   range 2   : -536870912 took  0.548 s
for   array 1024: -536870912 took  0.595 s

Probably additional issues for 2.13 would require a more narrowly-scoped ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants