Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undolog seems to leak memory #8302

Open
scabug opened this issue Feb 17, 2014 · 10 comments
Open

Undolog seems to leak memory #8302

scabug opened this issue Feb 17, 2014 · 10 comments
Labels
Milestone

Comments

@scabug
Copy link

scabug commented Feb 17, 2014

scalatest has a problem with high memory consumption again:
scalatest/scalatest#235 (comment)

It requires about 2GB to compile its tests. I was digging into it a little bit by capturing memory snapshot at the end of typer. Yourkit suggests that we are leaking memory through undoLog. Specifically, it shows two things:

  1. UndoLog is a class at the top of the list of classes with the highest memory retention (right after Global, ::, etc.)
  2. Object with biggest distance from GC root is TypeConstraint and its distance from GC root is 3 897 019. That is through series of :: objects that contain Tuple2.

I can't imagine that we would go almost 4m deep in undoLog stack for any single compilation unit so it seems like we are leaking memory somehow.

Related: scala/scala#1238

@scabug
Copy link
Author

scabug commented Feb 17, 2014

Imported From: https://issues.scala-lang.org/browse/SI-8302?orig=1
Reporter: @gkossakowski
Affected Versions: 2.11.0-M8
Attachments:

@scabug
Copy link
Author

scabug commented Feb 20, 2014

@retronym said:
@grzeg: i think that you miscounted the size of Scalacheck tests:

After running sbt gentests/test

cloc /Users/jason/code/scalatest/gentests/target/scala-2.11.0-M8/src_managed
     313 text files.
     313 unique files.
       0 files ignored.

http://cloc.sourceforge.net v 1.60  T=5.74 s (54.5 files/s, 81915.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Scala                          313          25743           6968         437602
-------------------------------------------------------------------------------
SUM:                           313          25743           6968         437602
-------------------------------------------------------------------------------

In addition to:

scala-2.11-M8 /code/scalatest cloc /Users/jason/code/scalatest/src/test
     667 text files.
     667 unique files.
       3 files ignored.

http://cloc.sourceforge.net v 1.60  T=3.76 s (176.6 files/s, 68295.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Scala                          650          35448          18728         202263
Java                            13             38            195             99
XML                              1              0              0              8
-------------------------------------------------------------------------------
SUM:                           664          35486          18923         202370
-------------------------------------------------------------------------------

@scabug
Copy link
Author

scabug commented Feb 20, 2014

@gkossakowski said:
Good catch. I forgot about generated tests.

Still, it doesn't explain what I saw about the undoLog, right?

@scabug
Copy link
Author

scabug commented Feb 20, 2014

@retronym said:
No. I compiled those scalatest tests with -Ydebug -Ylog:all and grepped for undolog. That turns on logging in the clear() method those shows the length of the log when it is cleared. That is done after each compilation unit.

It never shows more than 17k entries.

...
[info] [log typer] Clearing 0 entries from the undoLog.
[info] [log typer] Clearing 0 entries from the undoLog.
[info] [log typer] Clearing 0 entries from the undoLog.
[info] [log typer] Clearing 17129 entries from the undoLog.
[info] [log typer] Clearing 15197 entries from the undoLog.
[info] [log typer] Clearing 13458 entries from the undoLog.
[info] [log typer] Clearing 11912 entries from the undoLog.
[info] [log typer] Clearing 10559 entries from the undoLog.
[info] [log typer] Clearing 9399 entries from the undoLog.
[info] [log typer] Clearing 8432 entries from the undoLog.
[info] [log typer] Clearing 7658 entries from the undoLog.
[info] [log typer] Clearing 40 entries from the undoLog.
[info] [log typer] Clearing 2959 entries from the undoLog.
[info] [log typer] Clearing 0 entries from the undoLog.
[info] [log typer] Clearing 1480 entries from the undoLog.
[info] [log typer] Clearing 0 entries from the undoLog.
[info] [log typer] Clearing 10822 entries from the undoLog.
[info] [log typer] Clearing 506 entries from the undoLog.

So its a bit of a mystery how you saw such a long path the GC root. Did it pass through undoLog itself?

Looking around through comments, it seems we are really confused as to when we can clear the log. It is cleared eagerly when we actually have something to undo (when a type relation <:< or =:= returns false). But otherwise, the comments tell us:

} finally {
    subsametypeRecursions -= 1
    // XXX AM TODO: figure out when it is safe and needed to clear the log -- the commented approach below is too eager (it breaks #3281, #3866)
    // it doesn't help to keep separate recursion counts for the three methods that now share it
    // if (subsametypeRecursions == 0) undoLog.clear()
  }

To me, the natural place to clear the log is when we finish a type inference session. But I reckon any such attempt might run into the same underlying problem as #7782 / bce786f070afe383d203c4d809ef69803330b340, which is that we've got a single undolog per global, which is shared by interleaved typechecking that happens with source-completers. It might be interesting to see if you could move undolog to Context. That's were we store another, similar sort of undo-log:

  class Context {
    ...
    /** Saved type bounds for type parameters which are narrowed in a GADT. */
    var savedTypeBounds: List[(Symbol, Type)] = List()

@scabug
Copy link
Author

scabug commented Feb 20, 2014

@retronym said:
/cc Adriaan for insights, war stories.

@scabug
Copy link
Author

scabug commented Feb 20, 2014

@gkossakowski said (edited on Feb 20, 2014 12:44:17 PM UTC):
I just selected log field of UndoLog and asked yourkit to show me all objects retained by it. I get:
(I tried to include the image inline but JIRA is hopeless, it doesn't respect DPI information in PNG file :-/)
https://issues.scala-lang.org/secure/attachment/15810/yourkit-undolog-retained.png

I think this does suggest we leak memory. I know that this is a transitive closure, so maybe it's not undolog that retain that object directly but indirectly through some TypeRefs. I find this to be very unlikely especially if you look at how many Tuple2 are being retained. Maybe we blow up within single compilation run when there no clearing of undolog performed? I took memory snapshot in the middle of type checking.

@scabug
Copy link
Author

scabug commented Feb 24, 2014

Chua Chee Seng (cheeseng) said:
Just to add, in ScalaTest, 2GB is needed to compile the 'regular' tests (does not include generated tests), to compile generated tests, it'll need about 5GB currently to compile.

@scabug
Copy link
Author

scabug commented Nov 10, 2014

@retronym said:
I took another look for this problem but was unable to reproduce retention of undolog entries at the completion of the compiler. Perhaps I was measuring something different from "capturing memory snapshot at the end of typer"; @gkossakowski what mechanism did you use to control when the snapshot was taken?

Limiting SBT to one compilation at a time helps to reduce overall memory requirements and make the results of profiling easier to digest:

sbt> set concurrentRestrictions in Global := Seq(Tags.limit(Tags.CPU, 1))

@zsxwing
Copy link

zsxwing commented Oct 26, 2017

The memory leak also happens in an application using scala.reflect.api.Types.TypeApi#<:<. Here is a reproducer:

scala> :paste
// Entering paste mode (ctrl-D to finish)

    import scala.reflect.runtime.{universe => ru}
    println("before: " + ru.asInstanceOf[scala.reflect.runtime.JavaUniverse].undoLog.log.size)
    for (_ <- 0 until 1000) {
      ru.typeOf[Option[String]] <:< ru.typeOf[Option[_]]
    }
    println("after: " + ru.asInstanceOf[scala.reflect.runtime.JavaUniverse].undoLog.log.size)

// Exiting paste mode, now interpreting.

before: 0
after: 2000
import scala.reflect.runtime.{universe=>ru}

If increasing the loop length to a large value, it will OOM.

asfgit pushed a commit to apache/spark that referenced this issue Nov 10, 2017
…ncoder

## What changes were proposed in this pull request?

Because of the memory leak issue in `scala.reflect.api.Types.TypeApi.<:<` (scala/bug#8302), creating an encoder may leak memory.

This PR adds `cleanUpReflectionObjects` to clean up these leaking objects for methods calling `scala.reflect.api.Types.TypeApi.<:<`.

## How was this patch tested?

The updated unit tests.

Author: Shixiong Zhu <zsxwing@gmail.com>

Closes #19687 from zsxwing/SPARK-19644.
zsxwing added a commit to zsxwing/spark that referenced this issue Nov 10, 2017
…ncoder

Because of the memory leak issue in `scala.reflect.api.Types.TypeApi.<:<` (scala/bug#8302), creating an encoder may leak memory.

This PR adds `cleanUpReflectionObjects` to clean up these leaking objects for methods calling `scala.reflect.api.Types.TypeApi.<:<`.

The updated unit tests.

Author: Shixiong Zhu <zsxwing@gmail.com>

Closes apache#19687 from zsxwing/SPARK-19644.
@ivashin
Copy link

ivashin commented Jan 30, 2018

Experienced the same issue. Heap dump contained lots of the following objects:

 num     #instances         #bytes  class name
----------------------------------------------
   1:      33516613      804398712  scala.collection.immutable.$colon$colon
   2:      16756584      670263360  scala.reflect.internal.tpe.TypeConstraints$TypeConstraint
   3:       4190144      469296128  scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$10
   4:       4189252      301626144  scala.reflect.internal.Types$AbstractNoArgsTypeRef
   5:      12567433      301618392  scala.reflect.internal.tpe.TypeConstraints$UndoPair
   6:       4195420      167816800  java.util.WeakHashMap$Entry
   7:       4198527      134352864  scala.reflect.internal.Symbols$TypeHistory
   8:       4193368      134187776  java.lang.ref.WeakReference
   9:       4189151      134052832  scala.reflect.internal.Types$TypeVar$$anon$4

Workaround suggested in apache/spark#19687 (wrapping TypeApi.<:< invocations into JavaUniverse.undoLog.undo()) seems to have fixed the leak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants