Generate a regular expression!

Give us some examples and we will generate a regular expression.

What's this?

Regular expressions can pull specific pieces of information out of text, but they can be tough to write accurately to cover all variations.

Our regular expression generator API makes creating regular expressions easy. Just give it examples of data you want to pull out, and it will programmatically generate and test a regular expression.

It may generate a unicode regular expression such as \p{Alpha}+, so in this case if you want to use it in JavaScript you'll need to use XRegExp with the Unicode Plugin.

Try me out

Add examples here, one per line, or try a sample: [money|dates|email|iban|uk postcode|us zipcode|phone numbers|us states|us full address]

Advanced Options

Training set size

Take into account alphanumeric lengths (number, alphanum & caps) when aligning

Align tokens at when at least % match

Treat word as an enum for < options

Treat part of word as an enum for < options

Maximum characters in a character class

Maximum branches for word pattern

Maximum difference between max and min until max is unbounded

Extra sentence compression

Generate »

Login or signup for a free account to try it out!

How do I use the API?

  • POST up new-line seperated plain text UTF-8 to
  • Plain-text only supported currently - HTML isn't supported in this API currently.
  • Maximum 100 lines, 200 chars/line.
  • Query String parameters:
    • _apikey (required) - your API key
    • generalizeAt - when we should stop treating words like enums (default 5)
    • generalizeCharsAt- when we should stop treating word sections as enums (default 5)
    • generalizeCharClassAt - the maximum options to be in a character class, e.g. [tgh] (default 5)
    • alignAt - the ratio in [0,1] of the training set where we should consider word aligning (default 0.1)
    • maxBranchesForCharPattern - the number of paths for a word regex at which we should use a generic regex (default 10)
    • alignOnAlnumLengths - whether or not the length of alphanum, numeric and CAPS words is deemed important/interesting
    • compressingSentences - for free text, treat strings of words and punctuation as sentences where possible (default true)
    • trainingSetSize - the ratio in [0,1] of the input to use as the training vs test set (default 0.75)
    • maximumLengthDiffBeforeGeneralize - the maximum difference between a {start,end} until we use {start,} (default 10)
    • classes - the classes to use from DEFAULT_ENGLISH_CLASSES, DEFAULT_CLASSES, WORD_ENGLISH_CLASSES, WORD_CLASSES; "word" classes don't decompose alpha tokens into character patterns which is better for free text or names (default DEFAULT_ENGLISH_CLASSES)