Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lexing of non-printable char literals diverges from spec #8423

Closed
scabug opened this issue Mar 17, 2014 · 4 comments
Closed

lexing of non-printable char literals diverges from spec #8423

scabug opened this issue Mar 17, 2014 · 4 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Mar 17, 2014

according to the specification -> Lexical Syntax -> Character Literals and String Literals, non-printable characters are not valid literals. However, the literal '\u000A' as well as a literal newline character are accepted by the lexer (the spec explicitly mentions '\u000A' as being illegal)

In strings, the lexer also accepts "\u000A" though it doesn't accept a literal newline character

@scabug
Copy link
Author

scabug commented Mar 17, 2014

Imported From: https://issues.scala-lang.org/browse/SI-8423?orig=1
Reporter: Martijn Hoekstra (martijn)

@scabug
Copy link
Author

scabug commented Mar 17, 2014

@adriaanm said:
I'm not sure I understand. The relevant sentence in the spec (https://github.com/adriaanm/scala-ref-markdown/blob/master/03-lexical-syntax.md#character-literals) is:

The character is either a printable unicode character or is described by an escape sequence.

This explicitly allows '\u000A' because it uses an escape sequence. Are you saying the Scala program

val c = '
'

is accepted somehow? I can't reproduce that interpretation

@scabug
Copy link
Author

scabug commented Mar 18, 2014

Martijn Hoekstra (martijn) said:
On the first issue:

Yes, you are right in that regard. It's the example in the spec itself that doesn't align with the spec, not the lexer. The example from the spec:

Example: some character literals

'a' '\u0041' '\n' '\t'
Note that '\u000A' is not a valid character literal because Unicode conversion is done before literal parsing and the Unicode character \u000A (line feed) is not a printable character. One can use instead the escape sequence '\n' or the octal escape '\12' (see here).

This is clearly talking about the unicode escape '\u000A' rather than the literal newline character (or it wouldn't talk about the actual Unicode conversion). This part of the spec is taken 1-1 from the Java spec.I haven't tried for any other non-printing characters.

With regards to the second bit: yes, I'm saying that the Scala program

val c = '
'

is accepted. Locally on my machine with Scala 2.10.3 Win7 as well as on scastie http://scastie.org/4645 at least. I'm not sure if this is a bug with the lexer or a bug with the spec.

@scabug scabug added this to the Backlog milestone Apr 7, 2017
@som-snytt
Copy link

Duplicates #6810 which amended the spec text.

I think the correction says that the character can't be a newline, but it can be a unicode newline or a normal escape char \n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants