Scala Programming Language
  1. Scala Programming Language
  2. SI-4750

scala.util.matching.regex.replaceAllIn(String, (Match) => String) escapes backslashes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Scala 2.9.2
    • Component/s: Misc Library
    • Labels:
    • Environment:

      Windows 7, Scala 2.9.0.1 (also affects 2.8.1)

      Description

      scala.util.matching.regex.replaceAllIn(String, (Match) => String) seems to escape backslashes in the string to be operated on. The attached example shows how scala.util.matching.regex.replaceAllIn(String, (Match) => String) has different behavior compared to scala.util.matching.regex.replaceAllIn(String, String). In my environment I get the following output:

      c:\projects\j2s\Java2Scala>scala RegexTest
      rs = 'public foo \"gh\\dooh\"'
      r1 = '\"gh\\dooh\"'
      r2 = '"gh\dooh"'

      But I would have expected r1 and r2 to be identical.

        Activity

        Hide
        Jeff Tsay added a comment -

        Update: I found out that java.util.regex.Matcher.appendReplacement() has the same behavior, and is most likely the source of this problem. appendReplacement() is documented like this:

        ...
        Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
        ...

        In the case of regex.replaceAllIn(String, (Match) => String) the API user probably wouldn't be aware of this behavior and it would probably not be the common case to want to have replaceAllIn() to do processing of the returned string (for $1 references etc.). So I think replaceAllIn() should call Matcher.quoteReplacement() on the String result before passing it to Matcher.appendReplacement().

        Alternatively, the behavior can be left the same, and the user required to call Matcher.quoteReplacement() himself, but then the documentation for replaceAllIn(String, (Match) => String) should be augmented to mention the string processing that is done by appendReplacement().

        Show
        Jeff Tsay added a comment - Update: I found out that java.util.regex.Matcher.appendReplacement() has the same behavior, and is most likely the source of this problem. appendReplacement() is documented like this: ... Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string. ... In the case of regex.replaceAllIn(String, (Match) => String) the API user probably wouldn't be aware of this behavior and it would probably not be the common case to want to have replaceAllIn() to do processing of the returned string (for $1 references etc.). So I think replaceAllIn() should call Matcher.quoteReplacement() on the String result before passing it to Matcher.appendReplacement(). Alternatively, the behavior can be left the same, and the user required to call Matcher.quoteReplacement() himself, but then the documentation for replaceAllIn(String, (Match) => String) should be augmented to mention the string processing that is done by appendReplacement().
        Hide
        Daniel Sobral added a comment -

        FWIW, I concur. After all, all groups are already available through Match, which is the very reason why the function is Match => String, instead of String => String.

        Show
        Daniel Sobral added a comment - FWIW, I concur. After all, all groups are already available through Match, which is the very reason why the function is Match => String, instead of String => String.
        Hide
        Daniel Sobral added a comment -

        Pull request 139 closes the issue by documenting the behavior and providing a `quoteReplacement` method on Scala's `Regex` object.

        The behavior makes sense for the other replace methods, and I think it would be more surprising to have this one behave differently than having it behave like it does. The workaround is not too inconvenient, and proper documentation goes a long way towards addressing the issue.

        Show
        Daniel Sobral added a comment - Pull request 139 closes the issue by documenting the behavior and providing a `quoteReplacement` method on Scala's `Regex` object. The behavior makes sense for the other replace methods, and I think it would be more surprising to have this one behave differently than having it behave like it does. The workaround is not too inconvenient, and proper documentation goes a long way towards addressing the issue.
        Hide
        Daniel Sobral added a comment -

        I have now seen three questions on Stack Overflow about this problem, all in the past six months. It's past time this was fixed.

        Show
        Daniel Sobral added a comment - I have now seen three questions on Stack Overflow about this problem, all in the past six months. It's past time this was fixed.
        Hide
        Daniel Sobral added a comment -

        Fixed on 2.9.2-RC2 with commit 883cb14b12873062fc3a122a18df5429834ad3a9.

        Fixed on master with commit 479dd13148c380619d3e9156ef1913467decc05c.

        Show
        Daniel Sobral added a comment - Fixed on 2.9.2-RC2 with commit 883cb14b12873062fc3a122a18df5429834ad3a9. Fixed on master with commit 479dd13148c380619d3e9156ef1913467decc05c.

          People

          • Assignee:
            Daniel Sobral
            Reporter:
            Jeff Tsay
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development