Path Pattern (Regex)

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings using a specialized syntax held in a pattern.

Common Characters

The following table describes the most common characters used in regular expressions.

Character

Expression

Description

Example

Character

Expression

Description

Example

\

^*\.tsv

The backslash \ is an escape character.

Backslash specifies that the dot following it is an actual (Keyword) dot.

It is not just a character representing any single character.

abc

abc

Matches the exact keyword.

Matches a directly followed by bc. Directly matches with complete string.

abc can be replaced by any keyword (string) which needs to find completely.

|

a|b

Finds either string a or b.

Matches with string a or b. It finds either of the string separated by |.

[]

[ab]

Matches with the letter a or b.

Matches with the letter a or b or c.

[][]

[ab][xy]

Matches with the letter a or b followed by x or y.

Matches with the letter a or b must be followed by x or y.

-

[a-d6-9]

It defines the letter or number provided between the ranges.

Matches a letter between a to d and digits from 6 to 9.

It defines the ranges of the letter or numbers but not d6.

.

^T..t

The dot represents any single character.

A dot means anything.

In this example, two dots in the middle mean that the two middle letters of the word can be any letter.

Meta Characters

The following meta characters have a pre-defined meaning and make specific common patterns easier to use.

A meta character is a character that has a special meaning during pattern processing which defines the search criteria and any text manipulations.

Character

Expression

Description

Example

Character

Expression

Description

Example

\d

\d

It matches any digit equivalent to 0 to 9.

It matches digits from 0 to 9. For Example: It matches 2 in the “S2”.

\D

\D

It matches any non digit string.

It matches non digit character. For Example: It matches S in the “S2”.

\w

^\w

The \w meta character matches word characters.

A word character is a character a-z, A-Z, 0-9, including _ (underscore).

\W

(\d){1,2}\W(\d){1,2}

The backslash \ with W is a non-word character.

Backslash \ with W is a non-word character mentioned in the dates as a separator.

\S+

\S+

It matches with multiple non whitespace character.

It matches with one or more non white space character.

\b

\bLO

It matches the string at the beginning or end of a word.

It matches the string LO at the beginning of the word.

LO\b

It matches the string LO at the end of the word.

\B

\BLO

\B meta character matches the string at every position where \b does not.

It matches at any position between two-word characters and at any position between two non-word characters.

\s

\s

It matches single whitespace character.

Single whitespace characters can be:

  • A space character (\s)

  • A tab character (\t)

  • A carriage return character (\r)

  • A new line character (\n)

  • A vertical tab character (\v)

  • A form feed character (\f)

\S

\S

It matches a single character other than white space.

It matches with anything except a whitespace.

Meta Characters - White Space Characters

The following table describes the white space character used in regular expressions.

Character

Description

Character

Description

\t

It matches horizontal tabs (tabulators).

\r

It matches carriage return characters.

\n

It matches newline characters.

\v

It matches vertical tab characters (tabulators).

\f

It matches form feed characters.

Flags

The following table describes the flags used in regular expressions.

Flag

Description

Flag

Description

i

The "i" modifier specifies a case-insenitive match. It ignores the case while attempting a match in a string.

g

The "g" modifier specifies a global match.

m

The "m" modifier specifies a multiline match. It allows ^ and $ to match newline characters.

Anchors

The following table describes the anchors used in regular expressions.

Character

Expression

Description

Example

Character

Expression

Description

Example

^

^task

At the beginning of a regular expression, the circumflex sign signifies it is the beginning of a line.

Any character mentioned after the circumflex must be located at the beginning of the string.

The beginning of the line must have the task character to match the expression.

$

[.]*G$

At the end of a regular expression, the dollar sign signifies the end of a line. Any character mentioned before the dollar must be located at the end of the string.

Matches with every character that ends with G.

Quantifiers

The following table describes the most common characters used in regular expressions.

Character

Expression

Description

Example

Character

Expression

Description

Example

+

today\s+test

Matches the character immediately before it one or more times.

The + defines one or more whitespace characters after today, followed by the word "test."

*

*.*

The asterisk represents anywhere from zero to an infinite amount of characters.

The beginning and end of the single character can be any amount of characters with no limit.

?

x?

The ? represents the matches of the character no or one time.

It matches with only one x character or doesn't match.

*?

x*?

The *? is used to prevent overmatching.

It will only try to find one x match, and then the regular expression stops after the first match.

{x}

\d{4}

It defines the number of the digits.

It matches only with 4 digits sequence.

.{12}

It defines the number of the characters.

It matches only with words with 10 characters.

{x,y}

\d{2,5}

It defines limitations to the number of digits between x and y times.

It matches with a string that contains a number of digit from 2 to 5.

.{1,3}

It defines limitations to the number of characters between x and y times.

It matches with a string that contains a number of character from 1 to 3.

Examples

The following table lists several regular expressions with descriptions describing which pattern they would match.

Expression Format

Expression

Description

Expression Format

Expression

Description

<directory-name>/<file-name>

data\/.*\.csv$

CSV files are placed in a directory named "data" that needs to be parsed. It matches all files in the data directory that ends with .csv.

^(?:[\w]:|/)(/[a-z_-\s0-9.]+)+.(txt|gif|pdf|doc|docx|xls|xlsx|js)$

It matches txt, gif, pdf, doc, docx, xlx, xlsx and js files in the given directory.

For Example: c:/data/test/old/zio.sample.js

(\d\d\d\d\d\d\d\d\d\d\d\d)\/.*\.csv$

It matches folder names that have 12-digit numeric sequences and the file name ending with .csv.

<directory-name>

^(?:[\w]:|/)(/[a-z_-\s0-9.]+)

It matches all files that are stored in the given directory.

For Example: c:/data45/test/new/

<file-name> 

 ((\d{2})|(\d))\/((\d{2})|(\d))\/((\d{4})|(\d{2}))

It matches file names with the dates in DD/MM/YYYY format.

(\d{2}(\/|-)\d{2}(\/|-)\d{2,4})

It matches all file names with the name with dates in the following format:

  • MM/DD/YY = 07/13/20

  • MM/DD/YYYY = 07/13/2020

  • MM-DD-YY = 07-13-20

  • MM-DD-YYYY = 07-13-2020

(\d){1,2}\W(\d){1,2}\W(\d){2,4}

It matches all file names with the name with dates in the following format:

MM(separator)DD(separator)YY = 07/13/22 or 07-13-22 or 07|13|22 or 07.13.22

MM(separator)DD(separator)YYYY  = 07/13/2022 or 07-13-2022 or 07|13|2022 or 07.13.2022

^L…*C$

It matches all files that start with an A and end with a K.

Sample\s+test\s+count

It matches file names that the word "Sample" followed by one or more whitespace characters, followed by the word "test" followed by one or more whitespace characters, followed by the word "count."

 

Privacy Policy
© 2022 CSG International, Inc.