New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex.unapplySeq should use m.find(0) instead of m.matches #5045
Comments
Imported From: https://issues.scala-lang.org/browse/SI-5045?orig=1 |
Lanny Ripple (lanny ripple) said:
|
Lanny Ripple (lanny ripple) said:
|
@dcsobral said: The best I can think of is make this behavior depend on a flag, defaulting to present behavior, and with a method returning a sensible behavior-flagged regex. Something like: val Re = """(x)""".r.notAnchored I suggest you implement it and submit through a pull request on github, then post the link to the pull request here. It is by far the way more likely to produce results. |
Lanny Ripple (lanny ripple) said: Also this wouldn't change existing code. A Regex designed to match an entire string because of implicit anchors will match an entire string in the absence of implicit anchors as well. At worst I could see code that was thought to be working but wasn't (testing ftw) start to work. On the other hand this change would bring Scala's matching into line with how other regex engines work (i.e, what you see in the regex is what you get in the match). All that being said you are the guy with all the experience here so I'll code up the change as you suggest and go forward from there. |
Lanny Ripple (lanny ripple) said: Implemented using a subclass that the super is aware of (re: unanchor()). Not terribly happy with that but didn't want to move Regex to an abstract class. Maybe a RegexLike trait? Feedback appreciated. |
@paulp said: |
@dcsobral said: I am saying that "fixing" it could break things. For example: string match {
case p1() => ...
case p2() => ...
} An application might well depend on p1 not doing a partial match, because that's how it has always worked. If a The solution I proposed enables people who want "correct" behavior to be able to get it, while not breaking any existing code. Now, just a bit of review on the pull request, I'm a bit concerned about the use of "lazy val" for pattern. Access to a "val" is much faster than to a "lazy val", and pretty much every method needs a compiled pattern. So, while one might save compiling a pattern twice when creating an "unanchored" version, one looses performance everywhere else. |
Lanny Ripple (lanny ripple) said: Good point on lazy val. One also loses performance on the Regex compile but I missed that lazy would impact all existing code. Admittedly only once and the docs point out that the Regex should be constructed once rather than in a loop but can't count on that being followed. Hmm. |
@dcsobral said (edited on Oct 13, 2011 1:46:40 AM UTC): val emptyLine = """\s*(?:#.*)?""".r
val code = """([^#]*)(?:#.*)?""".r
def stripComments(lines: Seq[String]) = lines flatMap {
case emptyLine => None
case code(x) => Some(x)
} This will stop working if matches suddenly become unanchored. As for lazy val, it doesn't have an impact only once. It has an impact every time pattern gets accessed, which is pretty much by any method called on the regex. The first time it goes through synchronization, but the other times it still has to go through a volatile flag check before retrieving the value. |
Lanny Ripple (lanny ripple) said: I've stated twice now that I'm aware there would be changes in behavior if the implicit anchors were removed. I think we can say we're in violent agreement on that point. The problem I'm bumping into is construction order. Regex initializes
] which is out because Regex as an abstract trait would break
(and who knows what else) or to use a lazy val to initialize after full class construction. My understanding of lazy is based on the single Stack Overflow question. If it's correct then the initialization of a lazy is based on a volatile but subsequent access is a simple if() clause. That seems pretty light-weight to me given Scala and Java's hint to not recompile Patterns in tight loops but I don't have enough experience in this area to argue the point. So what am I left with? I have no idea since the two ways documented for getting around order of initialization problems are off the table. I get the feeling I'm frustrating you. It certainly isn't my intent. I recognize and appreciate that you don't have to be offering your advice. I've done development long enough to know that if someone makes suggestions to you when submitting fly-by RFEs you listen to them. (I have also read your blog entries, SO answers, and other scala comments so have some idea that besides offering advice you also know what you are talking about when you do.) However, ... I'll just leave it at however. I have an internal solution that works. If anyone bumps into this RFE it's simple to subclass Regex with an overridden unapplySeq which does not implicitly anchor. A case class and an implicit def to turn strings into this class using a method (having done perl in a past life I use .qr) is then straightforward. |
@dcsobral said: |
Lanny Ripple (lanny ripple) said: |
@dcsobral said: |
Lanny Ripple (lanny ripple) said: The Regex code seems to have better documentation now so that will have to be enough. |
Lanny Ripple (lanny ripple) said (edited on Jan 25, 2012 4:08:01 PM UTC): |
@dcsobral said: |
Lanny Ripple (lanny ripple) said: |
Jason Wheeler (bigwheels16) said (edited on Mar 15, 2012 6:29:06 PM UTC): {code}"""(\d+)""".r{code} will behave as if it was {code}"""^(\d+)$""".r{code} with the start-of-line and end-of-line anchors automatically added in the current implementation? |
Lanny Ripple (lanny ripple) said: As Regex is currently implemented its unapplySeq uses .match() which attempts to exactly match the regex to the provided string. In effect it adds the start and end anchors to your regex. There are several problems with this. First, you are already supplying a regex and if you wanted anchors presumably you would have supplied them. I have taught six people Scala and every one has tripped over this with, "Why is it not doing what I tell it?", being the most common complaint. Second, the quick-and-dirty way to get around this is to add The current pull request adds a trait which can be used to override unapplySeq to use .find(0) instead of .match(). Additionally it adds a .qr method into StringLike so that a Regex with NoImplicitAnchors can be easily generated just like using .r to get a Regex. Hope that states the problem and fix better. |
@dcsobral said (edited on Mar 15, 2012 7:04:46 PM UTC): |
@paulp said: |
@dcsobral said: |
Lanny Ripple (lanny ripple) said: I can understand that everyone must want to decorate String and didn't hold any real belief that .qr would get in there. Easy enough to pimp in locally if needed. Glad this has been put to bed. |
In Regex the unapplySeq method uses m.matches to determine if the regex should extract matching groups. This should instead use m.find(0).
Using m.matches acts as if the pattern were implicitly anchored. Consider
val Re = """(x)""".r
val Re(x) = "xyz"
val Re(y) = "x"
When using a regular expression a user can state explicitly that they want anchoring and the Regex library should not mandate that decision. As an aside this often leads to people surrounding their patterns with ".PTRN." which can turn a nice fast pattern into one with heavy backtracking. See Mastering Regular Expressions by Friedl for details.
The text was updated successfully, but these errors were encountered: