The regex package found in Java 1.4 is a very nice thing to have available as a standard library. It also seems like a great convenience that there are methods right on java.lang.String for using regexes. The crazy thing is that with String.matches the regex is essentially turned into “^regex$”, meaning that the entire string has to match the pattern you give it.

Why on Earth would they make the only regex searching function built into java.lang.String work like that? Clearly, if the behavior you want is to match the entire string, you can easily type those two characters… And, rather than documenting this right there at String.matches, you have to trudge through the javadoc until you get to the top of the Matcher javadoc to discover this magic behavior.

Unlike some other nameless blog that just rants about stuff without providing any solutions, I’ve got a solution to recommend to Sun: add either the find() or lookingAt() method to java.lang.String. I’m not sure what made them think that entire string matching was the most useful behavior, but at least they can easily do something about this without breaking existing programs. In the meantime, without changing any code, they can at least add a line to the java.lang.String.matches javadoc that says

“Warning: this method requires the entire string to match the regular expression, as if you had typed ^regex$ even though you hadn’t. By using the pattern .*regex.* this will probably do what you expect. Or you can increase your chances of RSI and use Pattern.compile(regex).matcher(string).find() to do what you really want.

21 Responses to “1.4 String.matches is dumb”
  1. Anonymous says:

    What a Winer. Perhaps you should write code with the brain in the “on” position.

    For those that live on inline hacks you can do

    (”fooBARfoo”.split(”B\wR”).length > 1)

    Another alternatice is that you could *shudder* write our own freaking utility method. Here, try this one…

    public static boolean c(String input, String regex) {
    return java.util.regex.Pattern.compile(regex).matcher(input).find();
    }

    I even gave it a one character name to help with your RSI fears.

  2. Anonymous says:

    I hope you’re not serious. Why would anyone suggest that all 5.75 bezillion Java developers implement their own distinct versions of the logic you’ve described above, when Sun could fix it exactly *once* so that no one would ever have to bother (as suggested in the original post)?

    If you really are serious, then you’re even more of a moron than your atrocious spelling and grammar suggests.

  3. Anonymous says:

    Dude. The current behavior of matches is
    exactly what I would expect. How about this?
    “abc”.matches(”abc”) ==> true // lame if this didn’t work
    “abc”.matches(”a?c”) ==> true
    “abc”.matches(”a”) ==> false // lame if this didn’t work

    I think you’re looking for:
    “abcdefg”.matches(”.*def.*”)

  4. Kevin says:

    Perhaps I’m biased because I’ve been using Perl regular expressions for ages. My expectation was the String.matches would be similar to =~ in perl.

    So, “abc” =~ /a/ is a match in perl. “abc” =~ /^a$/ is not.

    My main point was that Sun could’ve chosen to put find or lookingAt in java.lang.String, which would’ve meant that all someone would need to do is “^regex$” to get the equivalent of the matches behavior.

  5. Anonymous says:

    Since you are getting rid of the “slash quotes” and not using a non-existant operator in java (=~, which is probly meant to look like an approximately equals with the squiggle on top of the equals) the comparison to perl quicky erodes. Most developers care about exact matches, and if you want a contains construct then you could just do “.*abc.*” to get the contains match. Besides, the docs clearly state that for performance you would want to use a cached or static Pattern object anyway.

  6. Kevin says:

    I’m not sure about “most developers care about exact matches” when it comes to regular expressions. I can see instances where you’d want exact matches and others where you wouldn’t. What about wanting to do a regular expression form of startsWith?

    As for performance: if you’re only rarely doing the regex, using the java.lang.String convenience methods is nice. “Premature optimization is the root of all evil”

  7. Anonymous says:

    I rarely write Perl, but thinking back, I can see how you might jump to the conclusion that Java’s matches works like Perl’s.

    It’s an interesting point though. Since Perl is the defacto string scraper, Java should be following Perl’s lead unless they have a compelling reason not to do so.

    Someone spelled this out in longhand at:
    http://developer.java.sun.com/developer/bugParade/bugs/4908449.html

    It is rather sad that find won’t be added for JDK 1.5. Maybe apache commons lang will add this.

  8. Cary says:

    So back to the original complaint. Mathces only tells you if the entire string mathces the given regular expression. So to find out if ANY PART of the string matches the given regular expression you can do this:

    String[] stringArray = yourString.split(regex);
    if(stringArray.length > 1)
    /*part of yourString matched the regex*/
    else
    /*the array of strings returned by split()
    contains more than one string and therefore
    part of the string did indeed match the
    given regex*/

    I hope this solves the original problem and I believe this is probably why Sun didn’t implement further regex matching functionality as the split method easily does the job with only a few lines of code.

  9. Kevin says:

    I’m guessing that split() is more expensive than find(), given that it has to read the entire string (find can stop at a match) and build up a new String array in the process.

    I also think that foo.find(”[bar]*”) is a bit more intuitive than the split example. Ease-of-use is an important thing in an API.

  10. Matt Post says:

    Winer? tazzzzz is right on. I just wasted the past hour trying to figure out why my matcher wasn’t working, until I finally came across this document and discovered that it was matching the entire string. If Sun had (a) implemented it “correctly” or at least (b) put in the suggested javadoc comment, I’d have saved myself a wasted chunk of time before dinner.

  11. Anon says:

    I think you make an excellent point. Especially since I ran into the same problem. :) Yes, Perl background.
    I’m more than tired of people making excuses for Java’s usability woes. I’m surprised the person who made the comment about you “wine’ing” (his/her spelling) didn’t tell you to reverse engineer the string class to determine how matches works. Get real. And, the term Java_docs_ !?!? api != doc

  12. Han says:

    1.4 String.matches is INDEED not just dumb but bonehead behavior on the part of Sun …not that it’s news :(

    I wasted couple of hours on this stupid thing before I found this blog.

  13. George says:

    I was equally frustrated by the default. My default expectation is that .matches would be doing a search within the string. One other method documented here: http://javaalmanac.com/egs/java.lang/HasSubstr.html

    is to use the “abcdef”.indexOf(”de”) >= 0 — the equals in case you actually do this “abcdef”.indexOf(”ab”)

  14. Pitr says:

    I second that: matches seems to ignore your string (ie: always false) whenever it contains a newline.

    Back to a custom static method.

  15. NGS says:

    I was expecting String.matches() to work like Matcher.find() and is provided as a convenience method. But, Sun implemented using the literal meaning of ‘matches’ (matches the entire string) and ‘find’(find the pattern) instead of what the developers want! API should make that point clear of what the method does.

  16. Loren says:

    Kevin’s absolutely right on this. Perl hacker or not, there’s no reason to secretly pull the “^$” characters into the match. Regex is very much an explicit form language, if you don’t specify “this is the beginning of the string, and this is the end of the string”, then there is no assumed beginning or end. This essentially makes it a pseudo regex. That said, “.*MYMATCH.*” is a nice and easy (but stupid) workaround.

  17. Keith says:

    Ditto on the thoughts echoed above. I just wasted over an hour trying to resolve why a simple regex wouldn’t match a particular string. If your going to mess with the regex under the covers, you should be up front about it in the documentation!

  18. Rupali says:

    Thanks to u Guys..I would have wasted so much time in this! String.matches is hopeless…

  19. Nick says:

    My Perl background was undoing me as well. An implicit ^ and $ is the kind of thing that should be written in very LARGE letters within the documentation. And it still isn’t in 1.6.

    Thanks for the very useful comments.

  20. Antonio says:

    Look, I have no perl background at all (rather javascript, smalltlak, and yes, java…) and this behaviour is dumb. In fact, it’s a violation of the protocol as Loren points out - if it claims to match a regex, and the regex doesn’t specify ^$, then it mustn’t work as if it did, bc it’s just wrong. I’d go as far as deprecating the method (since it’s just not possible to fix it, as there are probably too many people depending on the BUGGY behaviour).
    And yes, the java lot are arrogant about stuff like this. At times there are good explanations for why somthing works in a particular way, but you won’t be given that explanation bc the people answering won’t know - but they’ll know hoe to tell you to get lost.

  21. Jason Sheedy says:

    5 years on and this issue still raises it’s ugly head. I just wasted 30 minutes trying to figure out this unexpected behaviour !!!

Leave a Reply