You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
scala.util.matching.Regex.replaceAllIn has some problems in handling "$" (the dollar sign) in the input string. In some cases, it treats $3 (for example) in the input as a capture group replacement / back-reference. To do so is incorrect.
I'll walk you through some examples, starting with examples that work. Also, please notice the variation in behavior.
I'll start with a pattern I used in some NLP processing to strip 's from possessive tokens.
val pattern = """([^']+)'s""".r
pattern: scala.util.matching.Regex = ([^']+)'s
scala> pattern.replaceAllIn("Bob's", m => m.group(1) )
res0: String = Bob
scala> pattern.replaceAllIn("Frank", m => m.group(1) )
res1: String = Frank
scala> pattern.replaceAllIn("$300", m => m.group(1) )
res2: String = $300
scala> pattern.replaceAllIn("$300's", m => m.group(1) )
java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:375)
at java.util.regex.Matcher.appendReplacement(Matcher.java:880)
at scala.util.matching.Regex$Replacement.replace(Regex.scala:861)
at scala.util.matching.Regex$Replacement.replace$(Regex.scala:861)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:839)
at scala.util.matching.Regex.$anonfun$replaceAllIn$1(Regex.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:491)
... 29 elided
Above, please note that the first argument to pattern.replaceAllIn is an input string. There should be no handling of capture groups (e.g. $2) in an input string.
scala> pattern.replaceAllIn("$x300's", m => m.group(1) )
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:857)
at scala.util.matching.Regex$Replacement.replace(Regex.scala:861)
at scala.util.matching.Regex$Replacement.replace$(Regex.scala:861)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:839)
at scala.util.matching.Regex.$anonfun$replaceAllIn$1(Regex.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:491)
... 29 elided
scala> pattern.replaceAllIn("$600's", m => m.group(1) )
java.lang.IndexOutOfBoundsException: No group 6
at java.util.regex.Matcher.start(Matcher.java:375)
at java.util.regex.Matcher.appendReplacement(Matcher.java:880)
at scala.util.matching.Regex$Replacement.replace(Regex.scala:861)
at scala.util.matching.Regex$Replacement.replace$(Regex.scala:861)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:839)
at scala.util.matching.Regex.$anonfun$replaceAllIn$1(Regex.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:491)
... 29 elided
Note the variation of behavior here (no error)
scala> pattern.replaceAllIn("$1200's", m => m.group(1) )
res7: String = $1200200
The text was updated successfully, but these errors were encountered:
David James (dj49) said:
I see now that Regex.quoteReplacement can be used, as follows:
"""([^']+)'s""".r.replaceAllIn(token, m => Regex.quoteReplacement(m.group(1)))
However, this seems like a lot of work to do a common thing (replace based on a capture group).
The other situation, where someone wants to replace with a "$3" and then have that trigger a back-reference substitution, seems much rarer.
I realize this is getting a little ranty. It was painful. Anyhow, I don't want to break backwards compatibility, but this does break the principle of least surprise.
@som-snytt said (edited on Feb 18, 2017 9:37:00 PM UTC):
I feel your pain, but it's clearly documented in the scaladoc and is just the way Java does it.
In my browser, I have to click on the method to show the explanation: $n is a group, so use quoteReplacement. Or in this example, use _ => "$1". The use case for the group method here would be to conditionally construct the replacement string.
This why we no longer construct APIs around strings. I think they're going to spin scala.util out into its own module, so maybe folks will devise a better scala.util.matching, working from accumulated knowledge.
Also, a linter could warn about unescaped $1 in a replacement string.
scala.util.matching.Regex.replaceAllIn
has some problems in handling "$" (the dollar sign) in the input string. In some cases, it treats $3 (for example) in the input as a capture group replacement / back-reference. To do so is incorrect.I'll walk you through some examples, starting with examples that work. Also, please notice the variation in behavior.
I'll start with a pattern I used in some NLP processing to strip 's from possessive tokens.
val pattern = """([^']+)'s""".r
pattern: scala.util.matching.Regex = ([^']+)'s
scala> pattern.replaceAllIn("Bob's", m => m.group(1) )
res0: String = Bob
scala> pattern.replaceAllIn("Frank", m => m.group(1) )
res1: String = Frank
scala> pattern.replaceAllIn("$300", m => m.group(1) )
res2: String = $300
scala> pattern.replaceAllIn("$300's", m => m.group(1) )
java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:375)
at java.util.regex.Matcher.appendReplacement(Matcher.java:880)
at scala.util.matching.Regex$Replacement.replace(Regex.scala:861)
at scala.util.matching.Regex$Replacement.replace$(Regex.scala:861)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:839)
at scala.util.matching.Regex.$anonfun$replaceAllIn$1(Regex.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:491)
... 29 elided
Above, please note that the first argument to
pattern.replaceAllIn
is an input string. There should be no handling of capture groups (e.g. $2) in an input string.scala> pattern.replaceAllIn("$x300's", m => m.group(1) )
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:857)
at scala.util.matching.Regex$Replacement.replace(Regex.scala:861)
at scala.util.matching.Regex$Replacement.replace$(Regex.scala:861)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:839)
at scala.util.matching.Regex.$anonfun$replaceAllIn$1(Regex.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:491)
... 29 elided
scala> pattern.replaceAllIn("$600's", m => m.group(1) )
java.lang.IndexOutOfBoundsException: No group 6
at java.util.regex.Matcher.start(Matcher.java:375)
at java.util.regex.Matcher.appendReplacement(Matcher.java:880)
at scala.util.matching.Regex$Replacement.replace(Regex.scala:861)
at scala.util.matching.Regex$Replacement.replace$(Regex.scala:861)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:839)
at scala.util.matching.Regex.$anonfun$replaceAllIn$1(Regex.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:491)
... 29 elided
Note the variation of behavior here (no error)
scala> pattern.replaceAllIn("$1200's", m => m.group(1) )
res7: String = $1200200
The text was updated successfully, but these errors were encountered: