Background

RegExp Tester

Short for regular expression, a regex is a string of text that allows you to create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions. However, its only one of the many places you can find regular expressions. Regular expressions can also be used from the command line and in text editors to find text within a file.

When first trying to understand regular expressions, it seems as if it's a different language. However, mastering regular expressions can save you thousands of hours if you work with text or need to parse large amounts of data. Below is an example of a regular expression with each of its components labeled. This regular expression is also shown in the Perl programming examples shown later on this page.

regexp tester
Background

What is RegExp?

RegEx stands for 'regular expression' and is a method used by programmers to define search patterns. Regex is useful for extracting information from large blocks of data. Data can take many forms, whether that be plain text, files, or code. A regex search pattern is much more powerful and flexible than simple string searches, such as the search queries typically used with search engines.
For example, a regular expression is used when a password policy is stored in software that specifies certain character combinations for passwords. For such a password rule, the expression could look as follows:
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$"

This rule contains numerous specifications, such as the minimum length of 8 characters and the use of upper and lower case letters. For example, the expression .{8,} means that any character (symbolized by the dot) should occur eight times or more ({8,}).

Components of regular expressions

Regular expressions are commonly found in many different programming languages, but their exact implementation can differ. This means that occasionally, some characters may be used in different ways in different implementations. However, sometimes a character has a relatively universal use. Below are some common regular expression components.

Anchors

Anchors are characters that specify the location within a particular string to search. Regex was developed originally for line-based systems, so a lot of regex was developed around searching within lines. To find a character "A" in a string, you can use characters from the following list to find a match within a line:
  • ^A - Match at the beginning of a line.
  • A$ - Match at the end of a line.

Character sets

Character sets allow you to define explicit parameters for the type of text to be searched for. As an example, numerical ranges can be searched for, using [0-9]. However, regex supports character matching, so it can be useful for finding letter ranges, or for supporting alternate spellings. For example, gr[ae]y will match both 'gray' and 'grey'.
  • [0-9] - Match a range of numbers from 0-9.
  • [a-z] - Match lowercase letters from a-z.
  • [A-Z] - Match uppercase letters from A-Z.
  • [.] - Match any character except line break characters.

Modifiers

Modifiers can be used to alter the behavior of regex strings. They are typically wrapped in brackets and start with a question mark. Many modifiers are implementation-dependent, but below are some example characters.
  • (?c) - Turns off case sensitivity.
  • (?s) - Make the dot character include matches for line break characters.

Chaining regular expressions

Regular expressions can be chained together using the pipe character (|). This allows for multiple search options to be acceptable in a single regex string. For example, the regex string '(string1|string2|string3)' will search for 'string1', 'string2', and 'string3' within the same query, rather than having to run 3 separate queries. These can be chained together with any other regex character and with virtually no limitations as to how many.

Quantifiers in regular expressions

Quantifiers allow you to specify how many times you want a particular regex string to match. Quantifiers usually come in two variants: lazy and greedy. By default, regex matching is eager and will match as much as possible, which is not always the desired behavior. Lazy quantifiers allow you to limit how much is matched, and you can further specify how many times matches are found with other limiting characters, such as:
  • * - Match 0 or more times.
  • + - Match 1 or more times
  • { n } - Match exactly n times.

Advanced regular expressions

Regular expressions support advanced concepts, such as recursion, backreferencing, grouping, subroutines, conditionals, and more. These features allow you to find very specific information within large data sets, and you can even create regex strings to find results within results.

Related links




Download for Windows