New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of strings with a NUL byte #9915
Comments
Imported From: https://issues.scala-lang.org/browse/SI-9915?orig=1 |
@som-snytt said (edited on Sep 7, 2016 7:41:00 PM UTC): There are two differences between this format and the "standard" UTF-8 format. So, 0 and supplementary chars in constants are fffd. scala> val x = new str.Test
x: str.Test = str.Test@c808207
scala> x.str
res0: String = imagine JIRA allowed posting unicode comments instead of reporting "communications failure"
scala> str.Test.STR
res1: String = ����������������
|
jagadish (jagadish1989) said (edited on Sep 8, 2016 4:04:39 AM UTC): |
@som-snytt said: |
@som-snytt said: Unfortunately, there is no JIRA label for not-entirely-fruitless. It was fun adding "java-interop", which is only a slight stretch. The problem was that the constants were ingested incorrectly from the other class file. Now the copies are not differing, modulo separate recompilation. |
jagadish (jagadish1989) said (edited on Sep 8, 2016 3:42:36 PM UTC): Thanks for the pull request. It will be super-helpful if you add a comment in the code that - " Java class file constants are encoded using Modified utf-8 encoding and reference 4.4.7 of the JVM spec. " That way the reader has an idea that the reason behind doing |
@som-snytt said: |
jagadish (jagadish1989) said: |
Scala appears to encode strings with a NUL byte "\u000" differently than Java. To reproduce this define a compile time constant in Java.
Create a Java file (Test.java):
Create a sample Scala main program and access TEST from there.
When accessing the string TEST from scala, the NUL byte appears to be encoded with 2 bytes. So, even simple equality tests like
fail and return false.
However, when TEST is made a private field in the class, and returned from class Test from a static getter function - getTestValue(), the equality check {code:java} "\0ABC".equals(Test.getTestValue()) {code} passes and returns true.
I took a look at the generated bytecode, and I suspect defining TEST as a compile time constant and makes this difference (as opposed to accessing it via a getter()).
Can someone please explain if I'm missing something obvious related to encoding of Strings? Some insight on the problem will be helpful.
The text was updated successfully, but these errors were encountered: