Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readResolve for nested objects causes a new object to be created instead of using the de-serialized one #9375

Closed
scabug opened this issue Jul 2, 2015 · 15 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Jul 2, 2015

import java.io._

class Outer extends Serializable {

  // generated accesssor:
  // public attr()LOuter$attr$ = {
  //   if (attr$module == null) attr$lzycompute()
  //   else attr$module
  // }

  // generated method:
  // private attr$lzycompute()LOuter$attr$ = {
  //   this.synchronized {
  //      if (attr$module == null) attr$module = new Outer$attr$(this)
  //   }
  //   attr$module
  // }

  object attr extends Serializable {
    println("<attr constructor>")
    var x = "[initial]"
    private def readObject(in: java.io.ObjectInputStream): Unit = {
      in.defaultReadObject
      println("after read: " + this)
    }

    // generated method:
    // private readResolve()Ljava/lang/Object = { this.$outer.attr() }

    override def toString = s"$x -- ${System.identityHashCode(this)}"
  }
  override def toString() = attr.toString
}

object tl extends Serializable {
  println("<tl constructor>")
}

object Test {
  def main(args: Array[String]) = {
    val outer = new Outer
    outer.attr.x = "foobar"

    assert(tl eq serializeDeserialize(tl)) // ok, ensured by tl$.readResolve which loads the static module instance field

    // serialize and deserialize the nesed object outer.attr
    // this behaves correctly
    //   - start de-serializing the object
    //   - de-serialize the Outer object and store it in the $outer field
    //     => the `attr$module` field of the outer object is being assigned
    //   - the readResolve method of Outer$attr$ class outer$.attr(), the object (already stored in `attr$module`) is returned

    println("attr before:" + outer.attr)
    val attrSD = serializeDeserialize(outer.attr)
    println("attr after: " + attrSD)
    assert(readPrivateField(outer.attr, "$outer") ne readPrivateField(attrSD, "$outer")) // ok
    println(outer.attr ne attrSD) // ok
    println()


    // serialize and deserialize the outer instance
    // there's an ordering problem:
    //   - start de-serializing the outer object
    //   - start de-serializing the Outer$attr$ object (to be stored in the outer object's attr$module field)
    //   - finish de-serializing the Outer$attr$ object, call its readResolve
    //   - readResolve calls `attr()` on the outer object. at this point the `attr$module` field is still null
    //   - therefore `attr$lzycompute` is invoked and a new module instance is created

    println("before:     " + outer)
    val outerSD = serializeDeserialize(outer)
    println("after:      " + outerSD)
  }

  def serializeDeserialize[T <: AnyRef](obj: T) = {
    val buffer = new ByteArrayOutputStream
    val out = new ObjectOutputStream(buffer)
    out.writeObject(obj)
    val in = new ObjectInputStream(new ByteArrayInputStream(buffer.toByteArray))
    in.readObject.asInstanceOf[T]
  }

  def readPrivateField(o: Object, field: String): Object = {
    val f = o.getClass().getDeclaredField(field);
    f.setAccessible(true)
    f.get(o)
  }
}
➜  sandbox git:(jfun) scalac Test.scala && scala Test
<attr constructor>
<tl constructor>
attr before:foobar -- 1349393271
after read: foobar -- 214126413
attr after: foobar -- 214126413
true

before:     foobar -- 1349393271
after read: foobar -- 396873410
<attr constructor>
after:      [initial] -- 1706234378

@scabug
Copy link
Author

scabug commented Jul 2, 2015

Imported From: https://issues.scala-lang.org/browse/SI-9375?orig=1
Reporter: @lrytz
Affected Versions: 2.11.7

@scabug
Copy link
Author

scabug commented Jul 3, 2015

@lrytz said (edited on Jul 28, 2015 7:43:12 PM UTC):
I think the correct fix is to change the readResolve method of the nested object to the following:

def readResolve() = {
  if ($outer.attr$module == null) this
  else $outer.attr$module
}

Since the field (attr$module) is private, we might create another name-mangled accessor for it (instead of making the field public)

Q: should the method be synchronized?

CC @axel22

@scabug
Copy link
Author

scabug commented Jul 7, 2015

@lrytz said:
CC @DarkDimius

@scabug
Copy link
Author

scabug commented Jul 8, 2015

@ijuma said:
Is this a new issue in 2.11.7 or something that has been there for a while?

@scabug
Copy link
Author

scabug commented Jul 8, 2015

@lrytz said:
It's an old bug. I just tested with 2.10.3, same there.

@scabug
Copy link
Author

scabug commented Jul 8, 2015

@ijuma said:
OK, thanks. :)

@scabug
Copy link
Author

scabug commented Jul 19, 2015

@axel22 said:
Lukas, what you propose fixes this problem, but we have to additionally make sure that all the objects that see Outer, agree on the same instance of the nested object attr.
If we are deserializing only an Outer instance along with its attr object, then this will indeed be the case.

However, there might be multiple instances of the attr object created per an Outer class instance if some other code invokes the attr() method between the time when the proposed attr.readResolve returns this and the time when Outer class is fully deserialized. The only situation where this could happen (that I can think of) is if the nested object has a custom readObject method:

class Outer extends Serializable {
  def foo = attr
  object attr extends Serializable {
    println("Ctor!")
    var x = "default"
    override def toString = s"attr($x)"
    private def readObject(in: ObjectInputStream) {
      println("dro begin")
      in.defaultReadObject
      println("dro done")
      foo
    }
  }
  override def toString = "Outer - " + attr
}

// Exiting paste mode, now interpreting.

defined class Outer

scala> val outer = new Outer
Ctor!
outer: Outer = Outer - attr(default)

scala> outer.attr.x = "custom"
outer.attr.x: String = custom

scala> serializeDeserialize(outer)
dro begin
dro done
Ctor!
res11: Outer = Outer - attr(default)

To avoid this situation, the readResolve method should set the attr$module field the first time that the singleton instance is used:

def readResolve() = {
  if ($outer.attr$module == null) {
    $outer.attr$module = this
    this
  } else $outer.attr$module
}

I would avoid synchronization here, since it could result in subtle deadlocks, but does not buy you much.
In fact, if the field checked by the attr() accessor is really not a volatile, then the initialization of the nested object is not thread-safe already.

@scabug
Copy link
Author

scabug commented Jul 20, 2015

@lrytz said (edited on Jul 20, 2015 9:37:06 AM UTC):
Thanks for you comment, Alex.

I'm not sure your suggestion would help in the situation you describe though. It seems readResolve is called only after readObject returns (example below). So we'd get this sequence of events:

  • start de-serializing the Outer instance
  • start de-serializing the attr$ instance for the attr$module field
  • invoke attr$.readObject
  • invokes the outer$.attr getter, which finds the attr$module field still being null, so a new module instance is created (constructor side-effects!)
  • invoke attr$.readResolve
  • the outer$.attr$module field is non-null, so its value is returned

I don't see a way to prevent invoking the module constructor if the module is accessed by a custom readObject.

case class C(var s: String) {
  private def readObject(in: ObjectInputStream): Unit = {
    println("Before Default")
    println(this)
    in.defaultReadObject
    println("After Default")
    println(this)
    this.s = "readObject"
    println(this)
  }

  private def readResolve(): Object = {
    println("readResolve")
    C("readResolve")
  }
}
scala> val c = C("orig")
c: C = C(orig)

scala> serializeDeserialize(c)
Before Default
C(null)
After Default
C(orig)
C(readObject)
readResolve
res2: C = C(readResolve)

@scabug
Copy link
Author

scabug commented Jul 21, 2015

Adam Pocock (craigacp) said (edited on Jul 21, 2015 1:03:26 PM UTC):
Our workaround for this bug is to call defaultReadObject in attr, then call Outer.attr, which does freshly instantiate the module and we then copy the values over. We wouldn't need this when the inner class de-serialises correctly, but it would have very counterintuitive behaviour if we needed to do computation at de-serialization time (for example reconstructing computed fields) and accidentally accessed attr instead of the this pointer.

@scabug
Copy link
Author

scabug commented Jul 24, 2015

@scabug
Copy link
Author

scabug commented Jul 28, 2015

@axel22 said:
Quick answer to whether it should be synchronized: no, as far as I can see.

Long answer:
I didn't check what gets generated, but if the following accessor is generated:

  // generated accesssor:
  // public attr()LOuter$attr$ = {
  //   if (attr$module == null) attr$lzycompute()
  //   else attr$module
  // }

and at the same time, the attr$module field is not volatile, then the lazy accessors for nested objects are not thread-safe -- it is possible that some thread reads a non-null attr$module reference to a nested object whose constructor did not yet complete (because the statements inside the synchronized block can be reordered with the constructor body of singleton object).

This bug could manifest itself:
a) during concurrent access to Outer's attr field
b) possibly during serialization if the reference to the Outer object somehow escapes to another thread

So, to fix this bug, the accessor method does not need to be synchronized.
However, the long-term fix should be to make attr$module volatile:

  • this, along with the existing synchronized in attr$lzycompute fix the problem (a)
  • if you also care about Outer object reference escaping to another thread, this should also fix (b), as far as I can see

@scabug
Copy link
Author

scabug commented Jul 29, 2015

@lrytz said:
after some thought and i-don't-tell-how-many-hours of working on the prototype for my initial idea, i'm quite convinced now that the right solution is simply to NOT emit a readResolve method. Submitted in scala/scala#4671.

@scabug
Copy link
Author

scabug commented Jul 30, 2015

@axel22 said:
Wow, nice. That sounds about right. It should not be necessary to ensure singletoness on the nested object, since both the nested object and its surrounding object are fresh and unique after deserialization anyway.

Still, I believe that my comment about the attr$module being volatile stands - that would make nested singleton objects more consistent with the top-level singletons.

@scabug scabug closed this as completed Sep 8, 2015
@scabug
Copy link
Author

scabug commented Sep 14, 2015

Adam Pocock (craigacp) said:
Is it possible to get the fix backported into a 2.11 release?

@scabug
Copy link
Author

scabug commented Sep 21, 2015

@lrytz said:
Backport PR: scala/scala#4757. I went for 2.12.x because my original attempt would not have been binary compatible, but the way it was finally solved is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants