Many simple search and replace operations can be performed using the
String.replaceAll() method. Sometimes, more
flexibility is required: for example, if not every instance of the expression
needs replacing, or if the replacement string is not fixed. In this case,
instaces of
Matcher provide a
find() method which we will look
at here. The idiom introduced here can also be used simply to find and
process instances of a pattern in a string, without necessarily appending
anything to another string.
The Matcher.find() method
Using
Matcher.find() shares some similarity to
Matcher.matches(). We first need to compile a
Pattern representing our regular expression
and then from this construct a
Matcher around the string that we want to
process.
But unlike when we use
matches(), our expression is now the pattern that we
want to find as a
portion of the string, rather than as the whole string.
And since the pattern can occur multiple times in the string being matched, we will
sit in a
loop calling the
find() method. The
find() method
will return
true as long as there's another match.
To perform the "replacement", as we go along, we actually build up a
new StringBuffer that will contain the new version of the
string with the replacements made. A couple of methods of the
Matcher object
will help us with this.
So keeping with our example of removing HTML 'bold' tags, the code now looks like this:
public String removeBoldTags(CharSequence htmlString) {
Pattern patt = Pattern.compile("<b>([^<]*)</b>");
Matcher m = patt.matcher(htmlString);
StringBuffer sb = new StringBuffer(htmlString.length());
while (m.find()) {
String text = m.group(1);
// ... possibly process 'text' ...
m.appendReplacement(sb, Matcher.quoteReplacement(text));
}
m.appendTail(sb);
return sb.toString();
}
You'll notice that the parameter passed in is not specifically a
String
but actually just any old
CharSequence. The
CharSequence
interface introduced in Java 1.4 is implemented by
String and by a few
other classes (such as
StringBuffer and
CharBuffer) that can
hold a 'sequence of characters'. On the other hand, the
appendX()
methods work only with
StringBuffers– it would have been nice if
they'd worked with any old
Appendable, but the latter interface did not
exist when the regular expressions API was added (in Java 1.4;
Appendable
was added in Java 5).
Group 0
You may recall from our discussion of
capturing groups
that there is always a group 0, which refers to the entire string
when using the
matches() method. When using the
find() method,
group 0 refers to the
entire portion of the string found to match
the expression on the previous call to
find().
Find without the replace
Of course, you can use
Matcher.find() without actually
using the replace. This can be used, for example, if you just want to count or process
instances of a particular pattern within a string.
No comments:
Post a Comment