Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map#mapValues is not serializable #7005

Closed
scabug opened this issue Jan 22, 2013 · 19 comments
Closed

Map#mapValues is not serializable #7005

scabug opened this issue Jan 22, 2013 · 19 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Jan 22, 2013

scala> import java.io._
import java.io._                  ^

scala> val buffer = new ByteArrayOutputStream
buffer: java.io.ByteArrayOutputStream = 

scala> val out = new ObjectOutputStream(buffer)
out: java.io.ObjectOutputStream = java.io.ObjectOutputStream@5596d480

scala> Map(1 -> 1).mapValues(_ * 3)
res1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)

scala> out writeObject res1
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$2
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
	at .<init>(<console>:14)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:731)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:980)
	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:570)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:601)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
	at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:745)
	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:790)
	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:702)
	at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:566)
	at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:573)
	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:576)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:867)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:822)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:822)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:822)
	at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:889)
	at xsbt.ConsoleInterface.run(ConsoleInterface.scala:57)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:73)
	at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:64)
	at sbt.Console.console0$1(Console.scala:23)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:24)
	at sbt.TrapExit$.executeMain$1(TrapExit.scala:33)
	at sbt.TrapExit$$anon$1.run(TrapExit.scala:42)
@scabug
Copy link
Author

scabug commented Jan 22, 2013

Imported From: https://issues.scala-lang.org/browse/SI-7005?orig=1
Reporter: Artūras Šlajus (arturaz)
Affected Versions: 2.10.0
See #6654, #5018

@scabug
Copy link
Author

scabug commented Jan 22, 2013

Artūras Šlajus (arturaz) said (edited by @Blaisorblade on Mar 25, 2013 8:13:30 AM UTC):
Similar to #5018.

Use {code}.mapValues( ... ).map(identity){code} as a workaround

@scabug
Copy link
Author

scabug commented Mar 25, 2013

@scabug
Copy link
Author

scabug commented Nov 29, 2013

Heikki Vesalainen (hvesalai) said:
I also run into this. I have many maps that I'm serializing so the workaround (.map(identity)) starts to feel very stupid.

@scabug
Copy link
Author

scabug commented Nov 29, 2013

Heikki Vesalainen (hvesalai) said:
Furthermore this is a mine since normally Maps will Serialize. There should be some type-level solution that would allow the developer to say that here I need a serializable Map.

@scabug
Copy link
Author

scabug commented Apr 7, 2014

Allan Douglas R. de Oliveira (douglaz) said:
Got this issue while working with Spark (which requires serializable objects to be passed to jobs). It's very annoying to get such errors at runtime.

@scabug
Copy link
Author

scabug commented Jul 30, 2014

Jeffrey Aguilera (jeffrey.aguilera) said:
This just bit me on 2.11.2. Needs to be fixed.

@scabug
Copy link
Author

scabug commented Aug 1, 2014

Art Peel (foundart) said:
I also got hit with this while working with Spark.

@scabug
Copy link
Author

scabug commented Aug 5, 2014

@Blaisorblade said:
Latest discussion is one further link away: scala/scala#2308. But basically, the current implementation is a view and can't be made Serializable safely, because closures as a rule can't be serialized or might be hugely expensive.

See also #4776.

@scabug
Copy link
Author

scabug commented Aug 5, 2014

Heikki Vesalainen (hvesalai) said:
Yes, the basic problem is exactly that mapValues returns a Map, which cannot be distinguished from a concrete map. (compare and contrast to withFilter which returns a FilterMonadic or view which returns IterableView).

@scabug
Copy link
Author

scabug commented Nov 1, 2014

Michael Pfaffenberger (glitch253) said:
Hit this issue using MapDB.

@scabug
Copy link
Author

scabug commented Feb 1, 2015

@Ichoran said:
I think the solution, for now, is to make a ForceableTo trait that we mix in to the non-eager collections like this one. Then it's your responsibility to .force them upon serialization, or do something else if eager evaulation is not acceptable.

@scabug
Copy link
Author

scabug commented Feb 5, 2015

Jeffrey Aguilera (jeffrey.aguilera) said:
@rex, as was pointed out two years ago, mapValues(...).map(identity) will force conversion to a serializable instance. Renaming this to force is not really a solution.

@scabug
Copy link
Author

scabug commented Mar 29, 2015

@Ichoran said:
[~jeffrey.aguilera], my proposal was only to address whether or not you could tell from the types whether the identity map was needed. It's a partial solution. The other possibility is just stating that serialization forces evaluation, period. I like this one best. If people accidentally serialize huge maps with expensive computations that they don't actually want to perform, I think one can squarely point the finger back at them and say, "Well, that's what you asked for! If that's not what you want, serialize the source map and the expensive transformation separately."

That does raise a problem of having arbitrary numbers of the same map being duplicated, but I can't see a reliable path to correctness otherwise.

@scabug
Copy link
Author

scabug commented Mar 29, 2015

@Blaisorblade said (edited on Mar 29, 2015 2:33:27 PM UTC):
Indeed, there aren't perfect solution with this interface. A sensible interface for serialization when closures are involved is a hard problem — in fact, it's one of the problems that @heathermiller [~hmiller] (what's the right account?) has been working on with spores. Spores would solve the problem I described earlier:

But basically, the current implementation is a view and can't be made Serializable safely, because closures as a rule can't be serialized or might be hugely expensive.

by giving the programmer control over what gets in a closure. Yes, that's annoying, but things can't be perfect.
Hence, I suspect one should just pick some (imperfect) solution here, and try to design a proper spore-based interface as a proper fix.
But now that I pointed out the connection, maybe Heather can weigh in with her opinion?

Here's the current implementation of spores.
(Sorry I can't be bothered to relearn JIRA's link syntax).

In a perfect world, the type system could maybe distinguish between serializable and not serializable map values, depending on whether all involved closures are serializable. But this seems a follow-up research project, if it's actually plausible at all.

@scabug
Copy link
Author

scabug commented Jun 23, 2016

Meir Maor (meirmaor) said:
Is this differed from 2.12 because of lack of bandwidth or because there is no agreement on solution?
As I constantly suffer from this issue, I would gladly assist. If I make a pull request to make the result of mapValues materialize on serialization is there a reasonable chance of it being merged?

@scabug
Copy link
Author

scabug commented Jun 27, 2016

@lrytz said:
I don't think that making mapValues strict in 2.12 would pass reviews. The real fix is a more comprehensive rework, basically fixing #4776. A rework of collections is on the roadmap for 2.13, and fixing issues with views is one of the main program points. CC @szeiger.

@scabug
Copy link
Author

scabug commented Jun 27, 2016

Meir Maor (meirmaor) said:
I wasn't suggesting making mapValues strict. I was suggesting to write the result when serializing (Getting something not identical, but equal when deserializing).
I think serialization of non strict collection is a fundamental issue, exasperated in this case when the type system gives us no hint to the problem.
I think materializing when serializing is not a bad behavior, the only alternative I can think of is trying to serialize the function optimistically (many serialize just fine).
Materializing on serialization might be less clean, but it is far more useful, especially for novices who want things to "just work".

@scabug
Copy link
Author

scabug commented Nov 16, 2016

Sarah Gerweck (gerweck) said:
It seems very strange that mapValues and filterKeys are not strict: Scala's collections are usually strict unless you request a view. This unexpected behavior seems like a recipe for subtle, unexpected bugs. However, [~meirmaor]'s suggestion to at least have a strict serialization mechanism seems like a reasonable workaround if fixing mapValues isn't in the cards.

@scabug scabug added this to the 2.13.0-M1 milestone Apr 7, 2017
@adriaanm adriaanm modified the milestones: 2.13.0-M1, 2.13.0-M2 Apr 14, 2017
@szeiger szeiger modified the milestones: 2.13.0-M3, 2.13.0-M2 Jul 20, 2017
@adriaanm adriaanm modified the milestones: 2.13.0-M3, 2.13.0-M4 Jan 30, 2018
@lrytz lrytz modified the milestones: 2.13.0-M4, 2.13.0-M5 Apr 19, 2018
dongjoon-hyun pushed a commit to apache/spark that referenced this issue Jan 16, 2021
…d CatalogTable

### What changes were proposed in this pull request?
Replace `toMap` by `map(identity).toMap` while getting canonicalized representation of `CatalogTable`. `CatalogTable` became not serializable after #31112 due to usage of `filterKeys`. The workaround was taken from scala/bug#7005.

### Why are the changes needed?
This prevents the errors like:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the test suite affected by #31112:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31197 from MaxGekk/fix-caching-hive-table-2-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit to apache/spark that referenced this issue Jan 16, 2021
…d CatalogTable

### What changes were proposed in this pull request?
Replace `toMap` by `map(identity).toMap` while getting canonicalized representation of `CatalogTable`. `CatalogTable` became not serializable after #31112 due to usage of `filterKeys`. The workaround was taken from scala/bug#7005.

### Why are the changes needed?
This prevents the errors like:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the test suite affected by #31112:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31197 from MaxGekk/fix-caching-hive-table-2-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit c3d81fb)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit to apache/spark that referenced this issue Jan 16, 2021
…d CatalogTable

### What changes were proposed in this pull request?
Replace `toMap` by `map(identity).toMap` while getting canonicalized representation of `CatalogTable`. `CatalogTable` became not serializable after #31112 due to usage of `filterKeys`. The workaround was taken from scala/bug#7005.

### Why are the changes needed?
This prevents the errors like:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the test suite affected by #31112:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31197 from MaxGekk/fix-caching-hive-table-2-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit c3d81fb)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants