This is a comparison of regular expression engines.
Libraries
| Name | Official website | Programming language | Software license | Used by | 
|---|---|---|---|---|
| Boost.Regex[Note 1] | Boost C++ Libraries | C++ | Boost | Notepad++ >= 6.0.0, EmEditor | 
| Boost.Xpressive | Boost C++ Libraries | C++ | Boost | |
| DEELX | RegExLab | C++ | Proprietary | |
| FREJ[Note 2] | Fuzzy Regular Expressions for Java | Java | LGPL | |
| GLib/GRegex[Note 3] | GLib reference manual | C | LGPL | |
| GNU regex | Gnulib reference manual | C | LGPL | GNU libc, GNU programs | 
| GRETA | Microsoft Research | C++ | Proprietary | |
| Gregex | Grovf Inc. | RTL, HLS | Proprietary | FPGA accelerated >100Gbit/s regex engine for cybersecurity, financial, e-commerce industries. | 
| Hyperscan | Intel | C, x86-specific assembly (SSSE3+[1]) | 3-clause BSD | Rspamd | 
| ICU | International Components for Unicode | C, C++[Note 4] | ICU | Foundation (Apple and Swift open-source versions) | 
| Jakarta Regexp | The Apache Jakarta Project | Java | Apache | |
| java.util.regex | Java's User manual | Java | GNU GPLv2 with Classpath exception | jEdit | 
| JRegex | JRegex | Java | BSD | |
| MATLAB | Regular Expressions | MATLAB Language | Proprietary | |
| Oniguruma | Kosako | C | BSD | Atom, Take Command Console, Tera Term, TextMate, Sublime Text, SubEthaEdit, EmEditor and jq | 
| Pattwo | Stevesoft | Java (compatible with Java 1.0) | LGPL | |
| PCRE | pcre.org | C, C++[Note 5] | BSD | Apache HTTP Server, Nginx, BBEdit, Edbrowse, Julia, HHVM, Notepad++ < 6.0.0, PHP, Delphi, R, Exim SWI-Prolog | 
| Qt/QRegExp | Digia Archived 2013-12-12 at the Wayback Machine | C++ | Qt GNU GPL v. 3.0, | Kate, Kile | 
| regex - Henry Spencer's regular expression libraries | ArgList | C | BSD | |
| RE2 | RE2 | C++ | BSD | Go, Google Sheets, Gmail, G Suite | 
| Henry Spencer's Advanced Regular Expressions | Tcl | C | BSD | |
| RGX | RGX | C++ based component library | P6R | |
| RXP | Titan IC | RTL | Proprietary | hardware-accelerated search acceleration using RegEx available for ASIC, FPGA and cloud. Enables massively parallel content processing at ultra-high speeds. | 
| SubReg | Matt Bucknall | C | MIT | |
| TPerlRegEx | TPerlRegEx VCL Component | Object Pascal | MPLv1.1 | |
| TRE[Note 2] | Ville Laurikari | C | BSD | musl | 
| TRegExpr | TRegExpr, documentation, | Object Pascal | Dual-license: freeware, or LGPL with static linking exception | Total Commander | 
| Wolfram Language (Mathematica) | Wolfram Language Documentation Center | Wolfram Language | Proprietary | Mathematica, the Wolfram Development Platform | 
| XRegExp | XRegExp | JavaScript | MIT | 
Languages
| Language | Official website | Software license | Remarks | 
|---|---|---|---|
| ActionScript 3 | ActionScript Technology Center | Free | |
| APL (APLX, Dyalog, GNU) | APL Wiki | Licensed by the respective implementation | ⎕SS(PCRE),⎕R/⎕S(PCRE),⎕SS(PCRE2), respectively | 
| C++11 (C++) | C++ standards website | Licensed by the respective implementation | Since ISO14822:2011(e), similar to ECMAScript on default (Grammar Description) | 
| D | D | Boost Software License[Note 1] | |
| Free Pascal (Object Pascal) | freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; See wiki.lazarus.freepascal.org/Regexpr. | 
| Go | Golang.org | BSD-style | |
| Haskell | Haskell.org | BSD3 | Omitted in the language report, and in GHC's Hierarchical Libraries | 
| Java | Java | GNU General Public License | REs are written as strings in source code: all backslashes must be doubled, harming readability. | 
| JavaScript (ECMAScript) | ECMA-262 | BSD3 | Limited but REs are first-class citizens of the language with a specific /.../modsyntax. | 
| Julia | JuliaLang.org | MIT License | REs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available. | 
| Lua | Lua.org | MIT License | Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg. | 
| Mathematica | Wolfram | Proprietary | |
| .NET | MSDN | MIT License[Note 2][Note 3] | |
| Nim | nim-lang.org | MIT License | Standard library includes PCRE-based re and nre modules, as well as various alternatives (ex. strutils, pegs (Parsing Expression Grammar matching), strscans, parseutils, etc.). | 
| OCaml | Caml | LGPL | As of 2010, the standard module is generally regarded as deprecated;[2] often recommended libraries are pcre (with full support for PCRE) and re (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing). | 
| Perl | Perl.com | Artistic License, or GNU General Public License | Full, central part of the language | 
| PHP | PHP.net | PHP License | Has two implementations, with PCRE being the more efficient in speed, functions | 
| POSIX C (C) | POSIX.1 web publication | Licensed by the respective implementation | Supports POSIX BRE and ERE syntax | 
| Python | python.org | Python Software Foundation License | Python has two major implementations, the built in re and the regex library. | 
| Ruby | ruby-doc.org | GNU Library General Public License | Ruby 1.8, Ruby 1.9, and Ruby 2.0 and later versions use different engines; Ruby 1.9 integrates Oniguruma, Ruby 2.0 and later integrate Onigmo, a fork from Oniguruma. | 
| Rust | docs.rs | MIT License | The primary regex crate does not allow look-around expressions. There is an Oniguruma binding called onig that does. | 
| SAP ABAP | SAP.com | Proprietary | |
| Tcl | tcl.tk | Tcl/Tk License (BSD-style) | Tcl library doubles as a regular expression library. | 
| Wolfram Language | Wolfram Research | Proprietary: usable for free on a limited scale on the Wolfram Development platform | |
| XML Schema | W3C | Licensed by the respective implementation | |
| XPath 3/XQuery | W3C | Licensed by the respective implementation | 
- ↑ "STD.regex - D Programming Language - Digital Mars".
- ↑ "Dotnet/Corefx". GitHub. 16 February 2022.
- ↑ "Dotnet/Corefx". GitHub. 16 February 2022.
Language features
NOTE: An application using a library for regular expression support does not necessarily support the full set of features of the library, e.g., GNU grep uses PCRE, but supports no lookahead, though PCRE does.
Part 1
| "+" quantifier | Negated character classes | Non-greedy quantifiers [Note 1] | Shy groups [Note 2] | Recursion | Look-ahead | Look-behind | Backreferences [Note 3] | >9 indexable captures | |
|---|---|---|---|---|---|---|---|---|---|
| Boost.Regex | Yes | Yes | Yes | Yes | Yes[Note 4] | Yes | Yes | Yes | Yes | 
| Boost.Xpressive | Yes | Yes | Yes | Yes | Yes[Note 5] | Yes | Yes | Yes | Yes | 
| CL-PPCRE | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| EmEditor | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | 
| FREJ | No[Note 6] | No | Some[Note 6] | Yes | No | No | No | Yes | Yes | 
| GLib/GRegex | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| GNU grep | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | — | 
| Haskell | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| RXP | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes | 
| ICU Regex | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| Java | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| JavaScript (ECMAScript) | Yes | Yes | Yes | Yes | No | Yes | Yes[Note 7] | Yes | Yes | 
| JGsoft | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| Lua | Yes | Yes | Some[Note 8] | No | No | No | No | Yes | No | 
| .NET | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| OCaml | Yes | Yes | No | No | No | No | No | Yes | No | 
| PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| PHP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| Python | Yes | Yes | Yes | Yes | Yes[Note 9] | Yes | Yes | Yes | Yes | 
| Qt/QRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes | 
| RE2 | Yes | Yes | Yes | Yes | No | No | No | No | Yes | 
| Ruby, Onigmo | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| TRE | Yes | Yes | Yes | Yes | No | No | No | Yes | No | 
| Vim | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | 
| RGX | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| Tcl | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | 
| TRegExpr | Yes | ? | Yes | ? | ? | ? | ? | ? | ? | 
| XML Schema | Yes | Yes | No | — | No | No | No | No | — | 
| XPath 3/XQuery | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes | 
| XRegExp | Yes | Yes | Yes | Yes | No | Yes | Yes[Note 7] | Yes | Yes | 
- ↑ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.
- ↑ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.
- ↑ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".
- ↑ "Perl Regular Expression Syntax - 1.47.0".
- ↑ "User's Guide - 1.47.0".
- 1 2 FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.
- 1 2 As of ES2018
- ↑  Lua's only non-greedy quantifier is -, which is a non-greedy version of*. It does not have non-greedy versions of+or?; in the former case, the non-greedy effect can be achieved by repeating the token followed by-, but in the latter case, there is no equivalent.
- ↑ Supported by the optional regex library only.
Part 2
| Directives [Note 1] | Conditionals | Atomic groups [Note 2] | Named capture [Note 3] | Comments | Embedded code | Unicode property support [3] | Balancing groups [Note 4] | Variable-length look-behinds [Note 5] | |
|---|---|---|---|---|---|---|---|---|---|
| Boost.Regex | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | No | 
| Boost.Xpressive | Yes | No | Yes | Yes | Yes | No | No | No | No | 
| CL-PPCRE | Yes | Yes | Yes | Yes | Yes | Yes | Some[Note 6] | No | No | 
| EmEditor | Yes | Yes | ? | ? | Yes | No | ? | No | No | 
| FREJ | No | No | Yes | Yes | Yes | No | ? | No | No | 
| GLib/GRegex | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | No | 
| GNU grep | Yes | Yes | ? | Yes | Yes | No | No | No | No | 
| Haskell | ? | ? | ? | ? | ? | No | No | No | No | 
| RXP | Yes | Yes | No | Yes | Yes | No | No | No | No | 
| ICU Regex | Yes | No | Yes | Yes[Note 7] | Yes | No | Yes | No | No | 
| Java | Yes | No | Yes | Yes[Note 8] | Yes | No | Some[Note 6] | No | No | 
| JavaScript (ECMAScript) | No | No | No | Yes | No | No | Some[Note 6][Note 9][4] | No | Yes | 
| JGsoft | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | Yes | 
| Lua | No | No | No | No | No | No | No | No | No | 
| .NET | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | Yes | Yes | 
| OCaml | No | No | No | No | No | No | No | No | No | 
| PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No[Note 10] | 
| PHP | Yes | Yes | Yes | Yes | Yes | No | No | No | No | 
| Python | Yes | Yes | Yes[Note 11] | Yes | Yes | No | Yes[Note 12] | No | Yes[Note 13] | 
| Qt/QRegExp | No | No | No | No | No | No | No | No | No | 
| RE2 | Yes | No | ? | Yes | No | No | Some[Note 6] | No | No | 
| Ruby, Onigmo | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | No | 
| Tcl | Yes | No | Yes | No | Yes | No | Yes | No | No | 
| TRE | Yes | No | No | No | Yes | No | ? | No | No | 
| Vim | Yes | No | Yes | No | No | No | No | No | Yes | 
| RGX | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No | 
| XML Schema | No | No | No | No | No | No | Yes | No | No | 
| XPath 3/XQuery | No | No | No | No | No | No | Yes | No | No | 
| XRegExp | Leading only | No | No | Yes | Yes | No | Yes | No | Yes | 
- ↑ Also known as flags modifiers, modes modifiers or option letters. Example pattern: "(?i:test)".
- ↑ Also called independent sub-expressions.
- ↑ Similar to back references, but with names instead of indices.
- ↑ Special feature allowing to match balanced constructs without recursion.
- ↑ Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable.
- 1 2 3 4 5 6 7 8 9 Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.
- ↑ Available as of ICU55.
- ↑ Available as of JDK7.
- ↑ The support and range of properties is dependent on implementation.
- ↑ Experimental support added in v5.29.9.
- ↑ Supported by Python v3.11 and later, and the optional regex library only.
- ↑ May only be available in the regex library when used with Python versions after 3.3.
- ↑ Supported by the optional regex library only.
API features
| Native UTF-16 support[Note 1] | Native UTF-8 support[Note 1] | Multi-line matching | Partial match[Note 2] | |
|---|---|---|---|---|
| Boost.Regex | No | No | Yes | Yes | 
| GLib/GRegex | Yes | Yes | Yes | Yes | 
| RXP | Yes | Yes | No | Yes | 
| ICU Regex | Yes | No | Yes | ? | 
| Java | Yes[Note 3] | Yes[Note 3] | Yes | Yes | 
| .NET | No[Note 4] | Yes | Yes | ? | 
| PCRE | Yes[Note 5] | Yes | Yes | Yes | 
| Qt/QRegExp | Yes | No | No | ? | 
| Tcl | Yes | Yes[Note 6] | Yes | ? | 
| TRE | Yes | Yes | Yes | ? | 
| RGX | No | No | Yes | ? | 
| wxWidgets::wxRegEx[Note 7] | Yes | Yes | Yes | ? | 
| XRegExp | Yes | Yes | Yes | No | 
- 1 2 Means the format can be used internally without explicit conversion.
- ↑ Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully..
- 1 2 Supports Unicode 15.0 standard from 2023..
- ↑ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010..
- ↑ Since version 8.30.
- ↑ Tcl includes facilities to convert to and from UTF-8.
- ↑ wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.
See also
References
- ↑ "Getting Started – Hyperscan 5.4.0 documentation".
- ↑ "Regex - Regular Expressions in OCaml".
- ↑ "UTS #18: Unicode Regular Expressions".
- ↑ "ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification". www.ecma-international.org. Retrieved 4 August 2020.
External links
- Regular Expression Flavor Comparison – Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary
- Online Regular Expression Testing – with support for Java, JavaScript, .Net, PHP, Python and Ruby
- Implementing Regular Expressions – series of articles by Russ Cox, author of RE2
- Regular Expression Engines
    This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.