In Java, regular expressions are used to match patterns in text. Regular expressions are a powerful tool for manipulating and validating strings. Java provides the java.util.regex
package for working with regular expressions.
Here is an example of how to use regular expressions in Java:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexExample { public static void main(String[] args) { String text = "The quick brown fox jumps over the lazy dog."; String patternString = "fox"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(text); while (matcher.find()) { System.out.println("Found match at index " + matcher.start() + " to " + matcher.end()); } } }
In this example, the regular expression fox
is used to find matches in the text
string. The Pattern
class is used to compile the regular expression, and the Matcher
class is used to match the regular expression against the text. The find()
method of the Matcher
class is used to find matches in the text, and the start()
and end()
methods are used to get the start and end index of the match.
The output of this program will be:
Found match at index 16 to 19
Here is another example that shows how to use regular expressions to validate email addresses:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class EmailValidator { private static final Pattern EMAIL_PATTERN = Pattern.compile( "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@" + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$"); public static boolean isValidEmailAddress(String email) { Matcher matcher = EMAIL_PATTERN.matcher(email); return matcher.matches(); } }
In this example, the regular expression EMAIL_PATTERN
is used to validate email addresses. The regular expression matches email addresses that have the following format: username@domain.tld
.
The isValidEmailAddress
method takes an email address as input and returns true
if the email address is valid according to the regular expression, and false
otherwise.
Flags
In Java, flags are used to modify the behavior of regular expressions. Flags are optional parameters that can be passed to the Pattern.compile()
method to change the way regular expressions are matched against input text.
There are several flags that can be used in Java regular expressions. Here are some of the most commonly used flags:
CASE_INSENSITIVE
: This flag makes the regular expression case-insensitive. By default, regular expressions in Java are case-sensitive.MULTILINE
: This flag makes the^
and$
anchors match the start and end of lines, respectively, in addition to the start and end of the input string.DOTALL
: This flag makes the.
character match any character, including line terminators.UNICODE_CASE
: This flag makes the regular expression Unicode-aware. By default, regular expressions in Java are not Unicode-aware.CANON_EQ
: This flag makes the regular expression Unicode canonical equivalence-aware.
Flags can be specified using the (?<flag>)
syntax within the regular expression itself. For example, the CASE_INSENSITIVE
flag can be specified using the (?i)
syntax:
Pattern pattern = Pattern.compile("(?i)hello");
In this example, the regular expression hello
is compiled with the CASE_INSENSITIVE
flag. This means that the regular expression will match the input text regardless of the case of the letters.
Flags can also be specified as parameters to the Pattern.compile()
method:
Pattern pattern = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);
In this example, the regular expression hello
is compiled with the CASE_INSENSITIVE
flag specified as a parameter to the Pattern.compile()
method.
In summary, flags are used to modify the behavior of regular expressions in Java. They can be specified within the regular expression itself using the (?<flag>)
syntax, or as parameters to the Pattern.compile()
method. Flags can be used to make regular expressions case-insensitive, multiline, Unicode-aware, and more.
Regular Expression Patterns
Here are some common regex patterns in Java:
- Matching a specific string:
String pattern = "hello"; String text = "hello world!"; boolean isMatch = text.matches(pattern);
This will return
true
because the text contains the pattern “hello”. - Matching a pattern with a wildcard:
String pattern = "he..o"; String text = "hello world!"; boolean isMatch = text.matches(pattern);
This will also return
true
because the pattern “he..o” matches “hello”. - Matching a pattern with a character class:
String pattern = "[abc]"; String text = "aardvark"; boolean isMatch = text.matches(pattern);
This will return
true
because the text contains any of the characters “a”, “b”, or “c”. - Matching a pattern with a negated character class:
String pattern = "[^abc]"; String text = "aardvark"; boolean isMatch = text.matches(pattern);
This will return
false
because the text contains one of the characters “a”, “b”, or “c”. - Matching a pattern with a quantifier:
String pattern = "a{3,5}"; String text = "aaa"; boolean isMatch = text.matches(pattern);
This will return
true
because the pattern “a{3,5}” matches “aaa”. - Matching a pattern with a greedy quantifier:
String pattern = "a.*b"; String text = "aabab"; boolean isMatch = text.matches(pattern);
This will return
true
because the pattern “a.*b” matches “aabab”. - Matching a pattern with a non-greedy quantifier:
String pattern = "a.*?b"; String text = "aabab"; boolean isMatch = text.matches(pattern);
This will return
true
because the pattern “a.*?b” matches “aab” instead of “aabab”.
Metacharacters
In Java, metacharacters are used in regular expressions to specify patterns to match against input text. Here are some common metacharacters in Java regular expressions:
.
(dot) – Matches any single character except a newline character.^
(caret) – Matches the start of a line.$
(dollar sign) – Matches the end of a line.*
(asterisk) – Matches zero or more occurrences of the previous character or group.+
(plus sign) – Matches one or more occurrences of the previous character or group.?
(question mark) – Makes the previous character or group optional (i.e., matches zero or one occurrence).[]
(square brackets) – Defines a character class, which matches any character within the brackets.()
(parentheses) – Defines a capturing group, which captures the matched substring and can be used for grouping, back-referencing, or applying quantifiers.|
(vertical bar) – Defines an alternation, which matches either the left or right side of the vertical bar.\
(backslash) – Escapes metacharacters and special characters, and can also be used to create special sequences.
For example, the regular expression a.*b
matches any string that starts with the character ‘a’, followed by zero or more characters of any type, and ends with the character ‘b’. The regular expression \\d{3}-\\d{2}-\\d{4}
matches a social security number in the format of ‘XXX-XX-XXXX’, where ‘X’ is a digit.
Quantifiers
In regular expressions in Java, quantifiers are symbols that allow you to specify how many times a character, character class, or group of characters can appear in a pattern. The following are the quantifiers available in Java regular expressions:
- The asterisk (): This quantifier matches zero or more occurrences of the preceding character or group of characters. For example, the regular expression “a” matches any number of consecutive “a” characters, including zero.
- The plus sign (+): This quantifier matches one or more occurrences of the preceding character or group of characters. For example, the regular expression “a+” matches one or more consecutive “a” characters.
- The question mark (?): This quantifier matches zero or one occurrence of the preceding character or group of characters. For example, the regular expression “colou?r” matches both “color” and “colour”.
- Curly braces ({n}): This quantifier matches exactly n occurrences of the preceding character or group of characters. For example, the regular expression “a{3}” matches exactly three consecutive “a” characters.
- Curly braces ({n,m}): This quantifier matches between n and m occurrences of the preceding character or group of characters. For example, the regular expression “a{2,4}” matches between two and four consecutive “a” characters.
- Curly braces ({n,}): This quantifier matches n or more occurrences of the preceding character or group of characters. For example, the regular expression “a{3,}” matches three or more consecutive “a” characters.
Note that quantifiers can be used with character classes as well. For example, the regular expression “[a-z]{2,4}” matches between two and four consecutive lowercase letters.
It’s important to use quantifiers carefully to avoid unintended matches. For example, the regular expression “.*” matches any string, which may not be what you intended.
Regular expressions are a powerful tool for working with text in Java. They can be used to validate input, search for patterns, and manipulate strings.