Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode literal syntax thwarts common use cases for triple-quotes #4706

Closed
scabug opened this issue Jun 17, 2011 · 9 comments
Closed

Unicode literal syntax thwarts common use cases for triple-quotes #4706

scabug opened this issue Jun 17, 2011 · 9 comments

Comments

@scabug
Copy link

scabug commented Jun 17, 2011

PROBLEM: The conversion of unicode literals causes triple-quotes and comments to behave unexpectedly in common use cases:

  1. Generating LaTeX (compile error)
    println("""\usepackage{geometry}""") // error: error in unicode escape

  2. Windows paths (compile error)
    println("""c:\users""") // error: error in unicode escape

  3. Windows paths ("fails" silently)
    val f = """C:\test\uuuFeedUs""" // f is not what the user expects

  4. Escape codes in comments (cryptic compile error)
    val x = 1 // using \u000a gives error: value inside is not a member of Int

  5. The following is (unintuitive, undesirable?) valid scala code
    val x: String = \u0022A string.\u0022

PROPOSED SOLUTION:

3 backwards-compatible compile flags similar to -Xno-uescape. This will allow source-compatibility since they are optional.

  1. -Xuescape-squotes-only that restricts these unicode escape codes to single-quoted strings only. This gets rid of these above 5 undesired outcomes. However, it may interfere with certain libraries that rely on unicode escaping (e.g. scalaz)

  2. -X-no-uescape-tquotes that doesn't perform unicode unescaping in triple-quoted strings. This gets rid of the first 3 undesired outcomes.

  3. -X-no-uescape-comments that doesn't perform unicode unescaping in comments. This gets rid of the 4th undesired outcome.

Note: I would be happy to implement this solution, provided the change will be accepted into the compiler.

CURRENT WORKAROUNDS:

  1. Decompose the string:
    println("""""" + """usepackage{geometry}""")
  2. Use -Xno-uescape. However, this causes it to stop working even inside normal single-quoted strings.

(This issue is a summary of the discussion from the scala-user topic: https://groups.google.com/d/topic/scala-user/UoJ0sUn3yFU/discussion)

@scabug
Copy link
Author

scabug commented Jun 17, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4706?orig=1
Reporter: Jonathan Clark (jhclark)
Affected Versions: 2.9.1
See #3220

@scabug
Copy link
Author

scabug commented Jun 17, 2011

@soc said (edited on Jun 17, 2011 3:06:51 PM UTC):
This is related to my good old friend #3220. (While I think "removing comments" alltogether might not be the right approach, something sensible should be done...)

Sadly, I have no hopes anything sensible will be done to fix either of these issues.

@scabug
Copy link
Author

scabug commented Jun 17, 2011

Jonathan Clark (jhclark) said:
Hopefully, the much broader scope of this problem will turn some heads (it's not just about char constants). I hope that people don't simply punt with "this is a case where we don't try to improve over Java." 1) Improving over the many issues that Java is Byzantinianly locked into is a big reason why Scala exists IMHO and 2) the proposed fix here is optional and doesn't break backwards compatibility.

@scabug
Copy link
Author

scabug commented Jun 17, 2011

@Ichoran said (edited on Jun 17, 2011 3:48:57 PM UTC):
I would support making no-unicode-translation the default inside triple-quoted strings only. Having unicode translated is handy for certain * cough * Scalaz * cough * libraries that use unicode characters. If you want to stick a unicode CR inside your comment, that's up to you. However, triple-quoted strings are supposed to be literals. The \uABCD thing breaks that expectation spectacularly for e.g. Windows filenames which is exactly the sort of place where you want it to work.

Since Java has no string literals, there are no Java expectations to meet in this regard.

There could be a compile-time flag to turn on unicode escapes even in string literals if you really want it. Otherwise, this is a gotcha waiting to happen for Windows users.

@scabug
Copy link
Author

scabug commented Jun 20, 2011

@LilyLambda said (edited on Jun 20, 2011 12:16:35 AM UTC):
I wonder how any of the proposed solutions in the ticket would work. Don't they all rely on the lexer knowing something about the source text (like what a comment is) before the lexing is even done? There's no way to figure out what a comment, triple-quoted string, or single quoted string is until the unicode translation has already been done.

@scabug
Copy link
Author

scabug commented Jun 20, 2011

@dcsobral said:
That isn't really true. Single quotes have escapes of their own -- \n, for one. If you disable unicode escapes you'll notice \u doesn't work anymore anywhere, but \n will still work inside single quotes.

@scabug
Copy link
Author

scabug commented Jun 20, 2011

Jonathan Clark (jhclark) said:
Hi Dan, thanks for the continued feedback on this issue. I'm having a bit of trouble with some anaphora resolution in your last post. :-)

"That" isn't really true. That == Yuvi's comment regarding the lexer's myopia?

It might be that such context-dependent escapes (such as \n) are performed at a later parsing stage rather than at the lexing stage like the unicode escapes. However, this also answer's Yuvi's question as to how these can be implemented: When unicode escapes are disabled except within certain contexts, this translation can simply be moved to a stage at which the needed context is known.

@scabug
Copy link
Author

scabug commented Jun 21, 2011

@adriaanm said:
this limitation is absolutely unfortunate in this context, but in the grand scheme of things, the complication it adds to the compiler is too costly, therefore we propose to stick with the workaround

@scabug
Copy link
Author

scabug commented Jun 21, 2011

@VladUreche said:
Please see Adriaan's comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant