Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigDecimal#isWhole implementation is very heap intensive #6173

Closed
scabug opened this issue Aug 2, 2012 · 16 comments
Closed

BigDecimal#isWhole implementation is very heap intensive #6173

scabug opened this issue Aug 2, 2012 · 16 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Aug 2, 2012

{{{
BigDecimal("-4.6010652208491219E+1556245283").hashCode
}}}

Explodes with an OutOfMemoryError even with a 4GB heap. (tested using PaulP's sbt-extras script with SBT_MEM=4096) The issue is that the BigDecimal#isWhole function is extremely heap-intensive for large numbers. This is actually a bit silly, given that numbers like the one I pasted are never, ever going to be convertible to either Int or Double, which is what the isWhole check is intended to accomplish.

My proposal would be to change the implementation of BigDecimal#hashCode to do a quick sanity check on the size of the value to see if it's even possible to fall into the range of Int/Double. If not, then check the sign and return the value that Double's ## would have returned anyway (Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY). As things stand, this is literally a security whole in Scala for people who parse user-specified text as BigDecimal, even with non-ARBITRARY contexts.

@scabug
Copy link
Author

scabug commented Aug 2, 2012

Imported From: https://issues.scala-lang.org/browse/SI-6173?orig=1
Reporter: @djspiewak
Affected Versions: 2.9.2

@scabug
Copy link
Author

scabug commented Aug 2, 2012

@paulp said:
Actually it is also attempting to preserve transitivity of equality (no doubt unsuccessfully) because if BigInts and BigDecimals have to be equal to an Int, then they have to be equal to one another. Which means there's no upper bound.

I don't dispute that there's all kinds of silliness in there, but most of it is there for some plausible reason.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@djspiewak said:
While it's true that Scala's BigDecimal does need a custom hashCode, it doesn't need to be quite as aggressive as it is trying to be. The problem is stemming not from hashCode, but from isWhole. There is absolutely no need to call isWhole from hashCode if the value is greater than Double.MaxValue or less than Double.MinValue.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@paulp said:
I don't know why isWhole is so expensive, but it has to know if it may be equal to a BigInt, and the size is irrelevant. If BigDecimal(1) == (1: Int) && (1: Int) == BigInt(1), then BigDecimal(1) == BigInt(1) and BigDecimal(1).hashCode == BigInt(1).hashCode.

So unless you like the idea that somewhere BigInt(X) == BigDecimal(X) but BigInt(X + 1) != BigDecimal(X + 1), the hashCodes of BigDecimals must be coordinated with those of BigInts. Regardless of size. To do this, the BigDecimal must check whether it can conceivably be equal to a BigInt, or the hashcode scheme has to be such that BigDecimals/BigInts provide the same one when they are equal. If there is a better test than "isWhole", which does not have the ring of a heap-exhausting method, then please suggest it.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@djspiewak said (edited on Aug 3, 2012 5:22:05 PM UTC):
It looks like BigDecimal(a).hashCode is not constant for whole values of a > Long.MaxValue, which is contrary to what I had thought. It turns out that BigDecimal(a).hashCode == 0 where !BigDecimal(a).isWhole && (a > Double.MaxValue || a < Double.MinValue). But, determining that we need to get into that case still requires the isWhole check.

I would argue that BigInt and BigDecimal are being too aggressive with their large-value hashCodes. Honestly, if someone has a number that large, they've pretty much given up on efficiency. Let them deal with the hash collisions. This is especially valid (imo) in light of the fact that hashes will always collide for non-whole values greater than Double.MaxValue. Given that this avoids the call to isWhole altogether for sufficiently large-magnitude BigDecimal/BigInt values, it bypasses the performance issues with isWhole.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@paulp said:
I don't mean to defend any of the present code, it's just that either one accepts the native equals and hashcode methods of BigInt and BigDecimal -- and one should probably consider doing exactly this, and using extension methods for any other functionality -- or one messes with them, and all kinds of consequences arise.

To be clear, and despite the fact that (see below) the goal is not achieved, the hashcodes for these classes are not modified from the native ones for anything to do with efficiency. There is only one reason: "equal objects must have equal hashcodes." So collision rates are beside the point.

I checked, and we're not even doing the thing I talked about, so the thing I thought was a bad idea is already the way that it is, by which I mean:

scala> BigDecimal(Long.MaxValue) == BigInt(Long.MaxValue)
res9: Boolean = true

scala> (BigDecimal(Long.MaxValue) pow 2) == (BigInt(Long.MaxValue) pow 2)
res10: Boolean = false

So it's not like there's any wonderful behavior to preserve here. I'm not going to touch any of this, I'm only trying to illuminate it a little for whoever may attempt something.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@paulp said:
Well this is nice, somehow even with the String constructor BigDecimal loses precision through scaling.

scala> BigDecimal(new java.math.BigDecimal("85070591730234615847396907784232501249"))
res22: scala.math.BigDecimal = 85070591730234615847396907784232501249

scala> BigDecimal("85070591730234615847396907784232501249")
res23: scala.math.BigDecimal = 8.507059173023461584739690778423250E+37

That's why they come out unequal in my previous comment - the BigDecimal version gets rounded.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@paulp said:
So these hashCodes have to be equal. Of course, it doesn't WORK or anything - that would be too much to ask - but this is why it can't keep its paws off of hashCode.

scala> BigDecimal(new java.math.BigDecimal("85070591730234615847396907784232501249")) == BigInt("85070591730234615847396907784232501249")
res24: Boolean = true

scala> BigDecimal(new java.math.BigDecimal("85070591730234615847396907784232501249")).## == BigInt("85070591730234615847396907784232501249").##
res25: Boolean = false

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@djspiewak said:
So in conclusion: it's broken, and non-trivially so.

All of these things are problems, though some beyond our purview (I'm pretty sure the rounding issue is Java's fault). Maybe we should just focus on trying to make isWhole a little more efficient. I can't think of any reason why that shouldn't be a constant-heap operation.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@djspiewak said:
Actually, I'm just being silly. The OOM error isn't coming out of isWhole, it's coming out of unifiedPrimitiveHashCode:

{{{
java.lang.OutOfMemoryError: Java heap space
at java.math.BigDecimal.bigTenToThe(BigDecimal.java:3376)
at java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3508)
at java.math.BigDecimal.setScale(BigDecimal.java:2394)
at java.math.BigDecimal.toBigInteger(BigDecimal.java:2925)
at java.math.BigDecimal.longValue(BigDecimal.java:2960)
at scala.math.BigDecimal.longValue(BigDecimal.scala:344)
at scala.math.ScalaNumericConversions$class.toLong(ScalaNumericConversions.scala:21)
at scala.math.BigDecimal.toLong(BigDecimal.scala:160)
at scala.math.ScalaNumericConversions$class.unifiedPrimitiveHashcode(ScalaNumericConversions.scala:31)
at scala.math.BigDecimal.unifiedPrimitiveHashcode(BigDecimal.scala:160)
at scala.math.BigDecimal.hashCode(BigDecimal.scala:177)
}}}

So, the only solution then would be to return a constant for sufficiently large values, both in BigInt and in BigDecimal.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@non said:
It's not just isWhole... try calling toBigInt and see a similar explosion in java.math.BigDecimal.bigTenToThe().

Unfortunately I think this class is just busted.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

@paulp said:
The rounding issue is our fault - that's why they become equal when I pass a java.math.BigDecimal to scala.BigDecimal, but not if I create a scala.BigDecimal from the same String.

@scabug
Copy link
Author

scabug commented Aug 3, 2012

Jeff Skaistis (jeffska) said:
Comment from the Java BigDecimal source for bigTenToThe():

// BigInteger.pow is slow, so make 10**n by constructing a
// BigInteger from a character string (still not very fast)

Unfortunately, there's not a limit check in there, so it's creating a 3 GB array and dumping 1.5 billion zeros into it. Of course the actual pow() method does have a sanity check, so it would have been better just to call it.

I think the lesson here is that BigDecimal shouldn't be treated like it was an actual floating-point number. Personally, I'd like to see support for the new IEEE decimal floating points.

@scabug
Copy link
Author

scabug commented Oct 15, 2013

@soc said:
Java 8 ships with a considerably improved version of BigInteger. Might make sense to check whether and how some issues might have been solved on their side.

@scabug
Copy link
Author

scabug commented Oct 15, 2013

@Ichoran said:
We won't support Java 8 exclusively for quite some time, so an easy fix shouldn't be delayed that long. There is an "easy" fix that I will supply.

@scabug
Copy link
Author

scabug commented Jan 15, 2014

@adriaanm said:
scala/scala#3316

@scabug scabug closed this as completed Jan 15, 2014
@scabug scabug added this to the 2.11.0-M8 milestone Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants