Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstracting over sequential and parallel sequences broken #4843

Closed
scabug opened this issue Jul 27, 2011 · 14 comments
Closed

Abstracting over sequential and parallel sequences broken #4843

scabug opened this issue Jul 27, 2011 · 14 comments

Comments

@scabug
Copy link

scabug commented Jul 27, 2011

The below REPL examples shows the observed behavior under Scala 2.9.0.1 and 2.9.1.RC1. Under 2.9.0.1 there seem to be two bugs, one regarding type inference and the other regarding abstracting over sequential and parallel Seqs (that might also apply to other collections). Under 2.9.1.RC1 only the later occurs, yet the type inferencer seems to use a structural type instead of GenSeq.

The real issue from a user's perspective is, that even if a parallel Seq is returned by the numbers-method, map won't be executed in parallel, i.e. abstracting over sequential and parallel collections is broken.

===============================================================================
Scala 2.9.0.1

Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def numbers(isPar: Boolean) =
| if (isPar) (1 to 8).par else 1 to 8
:7: error: kinds of the type arguments (scala.collection.GenSeq[Any]{def seq: scala.collection.immutable.Seq[Any]}) do not conform to the expected kinds of the type parameters (type CC) in class GenericCompanion.
scala.collection.GenSeq[Any]{def seq: scala.collection.immutable.Seq[Any]}'s type parameters do not match type CC's expected parameters: has no type parameters, but type CC has one
def numbers(isPar: Boolean) =
^

scala> def numbers(isPar: Boolean): scala.collection.GenSeq[Int] =
| if (isPar) (1 to 8).par else 1 to 8
numbers: (isPar: Boolean)scala.collection.GenSeq[Int]

scala> numbers(true) map { _ => Thread.currentThread.getName }
res0: scala.collection.GenSeq[java.lang.String] = ParVector(Thread-5, Thread-5, Thread-5, Thread-5, Thread-5, Thread-5, Thread-5, Thread-5)

===============================================================================
Scala 2.9.1.RC1

Welcome to Scala version 2.9.1.RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def numbers(isPar: Boolean) =
| if (isPar) (1 to 8).par else 1 to 8
numbers: (isPar: Boolean)scala.collection.GenSeq[Int] with Serializable{def companion: scala.collection.generic.GenericCompanion[scala.collection.GenSeq{def seq: scala.collection.immutable.Seq[Any]}]; def par: scala.collection.parallel.immutable.ParSeq[Int]; def seq: scala.collection.immutable.Seq[Int]; def distinct: scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; def distinct: scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; protected def parCombiner: scala.collection.parallel.Combiner[Int,scala.collection.parallel.immutable.ParSeq[Int]]}; def intersect[U >: Int](that: scala.collection.GenSeq[U]): scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; protected def parCombiner: scala.collection.parallel.Combiner[I...
scala> numbers(true) map { _ => Thread.currentThread.getName }
res0: scala.collection.GenSeq[java.lang.String] = ParVector(Thread-5, Thread-5, Thread-5, Thread-5, Thread-5, Thread-5, Thread-5, Thread-5)

scala> def numbers(isPar: Boolean): scala.collection.GenSeq[Int] =
| if (isPar) (1 to 8).par else 1 to 8
numbers: (isPar: Boolean)scala.collection.GenSeq[Int]

scala> numbers(true) map { _ => Thread.currentThread.getName }
res1: scala.collection.GenSeq[java.lang.String] = ParVector(Thread-7, Thread-7, Thread-7, Thread-7, Thread-7, Thread-7, Thread-7, Thread-7)

@scabug
Copy link
Author

scabug commented Jul 27, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4843?orig=1
Reporter: @hseeberger
Affected Versions: 2.9.0, 2.9.1

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@paulp said:
The type it is inferring is GenSeq[Int] { refinement }. This is by definition also a GenSeq[Int]. You're seeing the the same thread in all 8 because 8 is such a low number it doesn't bother spreading the load.


scala> def numbers(isPar: Boolean): scala.collection.GenSeq[Int] = if (isPar) (1 to 256).par else 1 to 256
numbers: (isPar: Boolean)scala.collection.GenSeq[Int]

scala> numbers(true) map { _ => Thread.currentThread.getName } groupBy (x => x) mapValues (_.length) foreach println
(ForkJoinPool-1-worker-2,64)
(ForkJoinPool-1-worker-5,48)
(ForkJoinPool-1-worker-4,40)
(ForkJoinPool-1-worker-3,64)
(ForkJoinPool-1-worker-0,8)
(ForkJoinPool-1-worker-1,32)

scala> numbers(false) map { _ => Thread.currentThread.getName } groupBy (x => x) mapValues (_.length) foreach println
(Thread-17,256)

scala> def numbers(isPar: Boolean) = if (isPar) (1 to 256).par else 1 to 256
numbers: (isPar: Boolean)scala.collection.GenSeq[Int] with Serializable{def companion: scala.collection.generic.GenericCompanion[scala.collection.GenSeq{def seq: scala.collection.immutable.Seq[Any]}]; def par: scala.collection.parallel.immutable.ParSeq[Int]; def seq: scala.collection.immutable.Seq[Int]; def distinct: scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; def distinct: scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; protected def parCombiner: scala.collection.parallel.Combiner[Int,scala.collection.parallel.immutable.ParSeq[Int]]}; def intersect[U >: Int](that: scala.collection.GenSeq[U]): scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; protected def parCombiner: scala.collection.parallel.Combiner[I...

scala> numbers(true) map { _ => Thread.currentThread.getName } groupBy (x => x) mapValues (_.length) foreach println
(ForkJoinPool-1-worker-2,32)
(ForkJoinPool-1-worker-3,64)
(ForkJoinPool-1-worker-4,24)
(ForkJoinPool-1-worker-5,24)
(ForkJoinPool-1-worker-0,80)
(ForkJoinPool-1-worker-1,32)

scala> numbers(false) map { _ => Thread.currentThread.getName } groupBy (x => x) mapValues (_.length) foreach println
(Thread-20,256)

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@hseeberger said:
First (the more important comes below), it doesn't work for 1000 elements, neither:

scala> def numbers(isPar: Boolean) = if (isPar) (1 to 1000).par else 1 to 1000
numbers: (isPar: Boolean)scala.collection.GenSeq[Int] with Serializable{def companion: scala.collection.generic.GenericCompanion[scala.collection.GenSeq{def seq: scala.collection.immutable.Seq[Any]}]; def par: scala.collection.parallel.immutable.ParSeq[Int]; def seq: scala.collection.immutable.Seq[Int]; def distinct: scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; def distinct: scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; protected def parCombiner: scala.collection.parallel.Combiner[Int,scala.collection.parallel.immutable.ParSeq[Int]]}; def intersect[U >: Int](that: scala.collection.GenSeq[U]): scala.collection.GenSeq[Int]{def seq: scala.collection.immutable.Seq[Int]; protected def parCombiner: scala.collection.parallel.Combiner[I...
scala> numbers(true) map { _ => Thread.currentThread.getName }
res8: scala.collection.GenSeq[java.lang.String] = ParVector(Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, Thread-17, ...

Second: How can the collections know whether the load is high or not? Even for a small number like 8 or even 3 it could make sense to perform operations in parallel. Example using a collection of functions making web service requests:

object Weather {

def main(args: Array[String]) {
val locations = Seq(
...
"Berlin-Germany",
"San Francisco-USA")
val isParallel = args.headOption map {_ == "par"} getOrElse false
val time0 = System.currentTimeMillis
val currentTemperatures =
if (isParallel) (locations.par map currentTemperature).flatten.seq
else (locations map currentTemperature).flatten
val time1 = System.currentTimeMillis
val averageTemperature = currentTemperatures.sum / currentTemperatures.size
println("Calculated average temperature of about %s °C in %s ms.".format(averageTemperature, time1 - time0))
}

def currentTemperature(location: String): Option[Int] = {
def extractTemperature(weatherResponse: Elem) =
(weatherResponse \ "current_conditions" \ "temp_c").headOption flatMap { temp =>
catching(classOf[NumberFormatException]) opt { (temp \ "@DaTa").text.toInt }
}
val weatherRequest = :/("www.google.com") / "ig" / "api" <<? Map("weather" -> location, "hl" -> "en")
Http(weatherRequest <> extractTemperature)
}
}

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@hseeberger said:
Sorry, the above code snippet already works around this issue! But the intention should be obvious ...

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@paulp said:
a) It does work, for significantly fewer than 1000 elements, as I already demonstrated.

b) "How can it know whether the load is high or not" is a question about the design, not a bug. Please give me a chance to have something else in my life besides this bug database by reserving the bug database for bugs.

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@jrudolph said:
This isn't fixed in 2.9.1.RC1. So it is still a bug.

Welcome to Scala version 2.9.1.RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def numbers(isPar: Boolean): scala.collection.GenSeq[Int] = if (isPar) (1 to 256).par else 1 to 256
numbers: (isPar: Boolean)scala.collection.GenSeq[Int]

scala> numbers(true) map { _ => Thread.currentThread.getName } groupBy (x => x) map (x => (x._1, x._2.length)) foreach println
(Thread-9,256)

Aleksandar fixed that in r25235 but it was apparently not yet chosen for 2.9.1 probably also because I didn't report the bug earlier.

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@hseeberger said:
Paul,

(a) Sorry, but the code you posted doesn't even compile under 2.9.1.RC1. And as Johannes showed, making it compile still doesn't make it work.

(b) From a user perspective, if I call someCollection.par I want it to be parallel. Therefore I consider it a bug, but I am willing to take it to the scala-internal list.

Thanks,

Heiko

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@paulp said:
Oh, it's 2.9.1 we're worried about. You derailed me with the structural type talk. Have to see about anything else for 2.9.1.

@scabug
Copy link
Author

scabug commented Jul 27, 2011

@jrudolph said:
Here's a workaround for a few of those cases where you have control over the call-sites:

Put a definition like at a place where it can be seen from the call-sites of map etc.

implicit def possiblyParallelCanBuildFrom[Elem]: CanBuildFrom[GenSeq.Coll, Elem, GenSeq[Elem]] =
  if (isPar)
    ParSeq.canBuildFrom[Elem].asInstanceOf[CanBuildFrom[GenSeq.Coll, Elem, GenSeq[Elem]]]
  else
    GenSeq.canBuildFrom[Elem]

You can, of course, only use it if you manage to pass isPar around, as well. Overall not very usable in the end but it did work for us in one place because we configured isPar globally anyways.

@scabug
Copy link
Author

scabug commented Sep 8, 2011

@hseeberger said:
This looks fixed in 2.10 (scala-2.10.0.r25517-b20110816023619), i.e. could probably be closed.
Thanks!

@scabug scabug closed this as completed Jun 5, 2012
@scabug
Copy link
Author

scabug commented Jun 6, 2012

@jrudolph said:
Just a note for someone who stumbles on this later on: This is fixed for Scala >= 2.10 but isn't and won't be fixed for Scala 2.9.x because it can't be fixed without breaking binary compatibility.

@scabug
Copy link
Author

scabug commented Jan 18, 2013

Omid Aladini (omid) said:
Could you please shed some light on why this was happening on 2.9.x? Any pointers to discussions about this or the fix on 2.10?

@scabug
Copy link
Author

scabug commented Jan 19, 2013

@jrudolph said:
In 2.9.x the behavior depends on the right implicit CanBuildFrom being chosen. But the decision which CanBuildFrom to take is based on the static type of the collection, so that when you had GenSeq the parallel CanBuildFrom was never chosen.

In 2.10 the behavior changed to not rely on static type information.

I don't know the exact commit of the fix but here's a former discussion on the mailing list from the time the issue was fixed:

https://groups.google.com/d/topic/scala-language/5KZAZXnWS1I/discussion

@scabug
Copy link
Author

scabug commented Jan 21, 2013

Omid Aladini (omid) said:
@jrudolph thanks!

@scabug scabug added the critical label Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant