The regex package found in Java 1.4 is a very nice thing to have available as a standard library. It also seems like a great convenience that there are methods right on java.lang.String for using regexes. The crazy thing is that with String.matches the regex is essentially turned into “^regex$”, meaning that the entire string has to match the pattern you give it.
Why on Earth would they make the only regex searching function built into java.lang.String work like that? Clearly, if the behavior you want is to match the entire string, you can easily type those two characters… And, rather than documenting this right there at String.matches, you have to trudge through the javadoc until you get to the top of the Matcher javadoc to discover this magic behavior.
Unlike some other nameless blog that just rants about stuff without providing any solutions, I’ve got a solution to recommend to Sun: add either the find() or lookingAt() method to java.lang.String. I’m not sure what made them think that entire string matching was the most useful behavior, but at least they can easily do something about this without breaking existing programs. In the meantime, without changing any code, they can at least add a line to the java.lang.String.matches javadoc that says
“Warning: this method requires the entire string to match the regular expression, as if you had typed ^regex$ even though you hadn’t. By using the pattern .*regex.* this will probably do what you expect. Or you can increase your chances of RSI and use Pattern.compile(regex).matcher(string).find() to do what you really want.
Entries (RSS)
October 31st, 2003 at 6:32 pm
What a Winer. Perhaps you should write code with the brain in the “on” position.
For those that live on inline hacks you can do
(”fooBARfoo”.split(”B\wR”).length > 1)
Another alternatice is that you could *shudder* write our own freaking utility method. Here, try this one…
public static boolean c(String input, String regex) {
return java.util.regex.Pattern.compile(regex).matcher(input).find();
}
I even gave it a one character name to help with your RSI fears.
October 31st, 2003 at 7:52 pm
I hope you’re not serious. Why would anyone suggest that all 5.75 bezillion Java developers implement their own distinct versions of the logic you’ve described above, when Sun could fix it exactly *once* so that no one would ever have to bother (as suggested in the original post)?
If you really are serious, then you’re even more of a moron than your atrocious spelling and grammar suggests.
October 31st, 2003 at 10:32 pm
Dude. The current behavior of matches is
exactly what I would expect. How about this?
“abc”.matches(”abc”) ==> true // lame if this didn’t work
“abc”.matches(”a?c”) ==> true
“abc”.matches(”a”) ==> false // lame if this didn’t work
I think you’re looking for:
“abcdefg”.matches(”.*def.*”)
November 1st, 2003 at 9:47 am
Perhaps I’m biased because I’ve been using Perl regular expressions for ages. My expectation was the String.matches would be similar to =~ in perl.
So, “abc” =~ /a/ is a match in perl. “abc” =~ /^a$/ is not.
My main point was that Sun could’ve chosen to put find or lookingAt in java.lang.String, which would’ve meant that all someone would need to do is “^regex$” to get the equivalent of the matches behavior.
November 1st, 2003 at 2:11 pm
Since you are getting rid of the “slash quotes” and not using a non-existant operator in java (=~, which is probly meant to look like an approximately equals with the squiggle on top of the equals) the comparison to perl quicky erodes. Most developers care about exact matches, and if you want a contains construct then you could just do “.*abc.*” to get the contains match. Besides, the docs clearly state that for performance you would want to use a cached or static Pattern object anyway.
November 2nd, 2003 at 9:57 am
I’m not sure about “most developers care about exact matches” when it comes to regular expressions. I can see instances where you’d want exact matches and others where you wouldn’t. What about wanting to do a regular expression form of startsWith?
As for performance: if you’re only rarely doing the regex, using the java.lang.String convenience methods is nice. “Premature optimization is the root of all evil”
November 2nd, 2003 at 9:52 pm
I rarely write Perl, but thinking back, I can see how you might jump to the conclusion that Java’s matches works like Perl’s.
It’s an interesting point though. Since Perl is the defacto string scraper, Java should be following Perl’s lead unless they have a compelling reason not to do so.
Someone spelled this out in longhand at:
http://developer.java.sun.com/developer/bugParade/bugs/4908449.html
It is rather sad that find won’t be added for JDK 1.5. Maybe apache commons lang will add this.
November 18th, 2003 at 10:32 pm
So back to the original complaint. Mathces only tells you if the entire string mathces the given regular expression. So to find out if ANY PART of the string matches the given regular expression you can do this:
String[] stringArray = yourString.split(regex);
if(stringArray.length > 1)
/*part of yourString matched the regex*/
else
/*the array of strings returned by split()
contains more than one string and therefore
part of the string did indeed match the
given regex*/
I hope this solves the original problem and I believe this is probably why Sun didn’t implement further regex matching functionality as the split method easily does the job with only a few lines of code.
November 19th, 2003 at 5:17 pm
I’m guessing that split() is more expensive than find(), given that it has to read the entire string (find can stop at a match) and build up a new String array in the process.
I also think that foo.find(”[bar]*”) is a bit more intuitive than the split example. Ease-of-use is an important thing in an API.
January 29th, 2004 at 6:55 pm
Winer? tazzzzz is right on. I just wasted the past hour trying to figure out why my matcher wasn’t working, until I finally came across this document and discovered that it was matching the entire string. If Sun had (a) implemented it “correctly” or at least (b) put in the suggested javadoc comment, I’d have saved myself a wasted chunk of time before dinner.
March 14th, 2005 at 6:27 pm
I think you make an excellent point. Especially since I ran into the same problem.
Yes, Perl background.
I’m more than tired of people making excuses for Java’s usability woes. I’m surprised the person who made the comment about you “wine’ing” (his/her spelling) didn’t tell you to reverse engineer the string class to determine how matches works. Get real. And, the term Java_docs_ !?!? api != doc
February 4th, 2006 at 1:49 am
1.4 String.matches is INDEED not just dumb but bonehead behavior on the part of Sun …not that it’s news
I wasted couple of hours on this stupid thing before I found this blog.
April 14th, 2006 at 5:46 pm
I was equally frustrated by the default. My default expectation is that .matches would be doing a search within the string. One other method documented here: http://javaalmanac.com/egs/java.lang/HasSubstr.html
is to use the “abcdef”.indexOf(”de”) >= 0 — the equals in case you actually do this “abcdef”.indexOf(”ab”)
August 2nd, 2006 at 7:11 am
I second that: matches seems to ignore your string (ie: always false) whenever it contains a newline.
Back to a custom static method.
October 4th, 2006 at 6:43 pm
I was expecting String.matches() to work like Matcher.find() and is provided as a convenience method. But, Sun implemented using the literal meaning of ‘matches’ (matches the entire string) and ‘find’(find the pattern) instead of what the developers want! API should make that point clear of what the method does.
March 13th, 2007 at 2:40 am
Kevin’s absolutely right on this. Perl hacker or not, there’s no reason to secretly pull the “^$” characters into the match. Regex is very much an explicit form language, if you don’t specify “this is the beginning of the string, and this is the end of the string”, then there is no assumed beginning or end. This essentially makes it a pseudo regex. That said, “.*MYMATCH.*” is a nice and easy (but stupid) workaround.
September 24th, 2007 at 4:52 pm
Ditto on the thoughts echoed above. I just wasted over an hour trying to resolve why a simple regex wouldn’t match a particular string. If your going to mess with the regex under the covers, you should be up front about it in the documentation!
March 5th, 2008 at 3:44 am
Thanks to u Guys..I would have wasted so much time in this! String.matches is hopeless…
March 30th, 2008 at 9:04 pm
My Perl background was undoing me as well. An implicit ^ and $ is the kind of thing that should be written in very LARGE letters within the documentation. And it still isn’t in 1.6.
Thanks for the very useful comments.
June 18th, 2008 at 6:56 pm
Look, I have no perl background at all (rather javascript, smalltlak, and yes, java…) and this behaviour is dumb. In fact, it’s a violation of the protocol as Loren points out - if it claims to match a regex, and the regex doesn’t specify ^$, then it mustn’t work as if it did, bc it’s just wrong. I’d go as far as deprecating the method (since it’s just not possible to fix it, as there are probably too many people depending on the BUGGY behaviour).
And yes, the java lot are arrogant about stuff like this. At times there are good explanations for why somthing works in a particular way, but you won’t be given that explanation bc the people answering won’t know - but they’ll know hoe to tell you to get lost.
August 8th, 2008 at 3:03 am
5 years on and this issue still raises it’s ugly head. I just wasted 30 minutes trying to figure out this unexpected behaviour !!!