My account

v3.20.0

Understanding the what3words RegEx: A human-friendly guide

If you’ve ever needed to verify that a piece of text is in the format of a what3words address (three words separated by dots), you may have encountered the official what3words Regular Expression (RegEx).

RegEx patterns can look intimidating at first glance, so in this guide we’ll break down the what3words RegEx in plain English. We’ll also discuss how to adapt it for scanning free-form text (like chat messages), consider special cases like Vietnamese addresses that include spaces, and list the different punctuation marks that can separate the words.

The what3words RegEx pattern (exact match)

The what3words address format can be validated using a single-purpose RegEx pattern. This pattern is anchored to match the entire string (meaning it assumes the input is just the three word address with nothing extra). Here is the official RegEx for a full exact-match validation of a what3words address (including support for various languages and the optional /// prefix):

^\/{0,}(?:[^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+[.｡。･・︒។։။۔።।][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+[.｡。･・︒។։။۔።।][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+|[^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+(?:[\u0020\u00A0][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+){1,3}[.｡。･・︒។։။۔።।][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+(?:[\u0020\u00A0][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+){1,3}[.｡。･・︒។։။۔።。][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+(?:[\u0020\u00A0][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+){1,3})$

Let’s break down what this pattern is doing, step by step:

Anchors ( ^ and $ ): the ^ at the start and $ at the end of the RegEx ensure that the pattern matches the entire string from start to finish. In other words, the string should contain only a what3words address and nothing else. This is ideal for validating a standalone address field because it won’t allow any extra characters before or after the three words.
Optional prefix ( ^\/{0,} ): right after the ^ , the pattern allows \/{0,} – this means “zero or more forward slashes” . In practice, what3words addresses are most often presented with three leading slashes (like ///filled.count.soap ) as a visual indicator. The RegEx is allowing an optional /// at the beginning (technically it allows any number of slashes, but the intention is to cover the /// prefix or no slashes at all). This part is non-capturing (inside (?: ) ), and it does not require the slashes – so both "///index.home.raft" and "index.home.raft" would satisfy this portion.
Alternation for spacing styles: the pattern is split into two major alternatives separated by a| . This is to handle two scenarios:

a) No spaces within words: The first alternative matches the typical case where each of the three words contains no internal spaces (e.g. filled.count.soap )

b) Spaces within word groups: the second alternative handles languages like Vietnamese where what3words “words” can consist of multiple words separated by spaces (e.g.món hầm.kem sữa.thơ ca ). In this scenario, each of the three components may contain one or more spaces within it. Important: the RegEx is constructed such that either all three components have internal spaces or none do – it won’t allow mixing (you can’t have one part with a space and others without).

Word pattern (letters only): within each alternative, the pattern[^0-9\~!@#$%^&*()+-_= { }\|'<,.>?/";:£§º©®\s]+ appears in various forms. This cryptic character class is essentially saying “one or more characters that are not any of the following: digits, punctuation marks, or whitespace”. In simpler terms, it matches a sequence of letters (and letters with diacritic marks) from any language script. By excluding digits and symbols, it ensures that each “word” in the address is made up of alphabetic characters (letters including non-Latin characters and accents). For example, it would match “filled”, “écoute”, “محفظة”, “東京”, etc., but it would not match “hello!” (because “!” is disallowed) or “word123″` (digits disallowed) as part of a word.
Delimiters between words: the character class [.｡。･・︒។։။۔።।] in the pattern represents the separator that must appear between the three word segments. This is the set of all supported delimiter characters that what3words recognises. It includes the standard Latin script full stop . as well as various Unicode punctuation used as equivalents to a dot in other languages (for example, the ideographic full stop。 used in Japanese, the Arabic full stop۔ , etc.) . Exactly two of these delimiters must appear – one between the first and second word, and one between the second and third word. This guarantees the format is “word1<delim>word2<delim>word3”. We will detail all the supported delimiters in a section below.
Note on trailing punctuation: the RegEx expects the address to end cleanly after the third word and delimiter — it does not allow an additional delimiter or symbol afterward. This means that filled.count.soap would match, but filled.count.soap. (with a trailing full stop) would not. However, in natural sentences, it’s perfectly normal to see a what3words address followed by a full stop at the end of a sentence — like “Your address is filled.count.soap.” — and that punctuation is not considered part of the address by the RegEx. If you’re scanning free-form text, the RegEx will still correctly extract the address without including the trailing punctuation (see section on free-text scenarios below).
Handling Vietnamese (or spaced) word groups: In the second alternative of the RegEx (after the| ), you’ll notice a construction like [^...]+(?:[\u0020\u00A0][^...]+){1,3} for each word group. This looks complex, but it’s basically an extension of the “letters only word” pattern to allow internal spaces:

a)[^...]+ matches a sequence of letters as before (the first part of the word group).

b)(?:[\u0020\u00A0][^...]+){1,3} then allows one to three repetitions of “a space followed by another sequence of letters”. Here \u0020 is the Unicode for a normal space " " and \u00A0 is a non-breaking space. This means each word can be made up of two to four separate syllables separated by spaces. For example, it could match "món hầm" as one word (letters + space + letters), or“ dụng cụ pha chế” as one word.

c) By structuring the RegEx with a separate alternative, it ensures all three components use the same style. If the RegEx is matching the spaced what3words address version, then each of the first, second, and third words must contain at least one space. Conversely, if the “no internal spaces” version is used, none of the three can contain a space.

Case sensitivity: the RegEx shown does not explicitly enforce lowercase, even though what3words addresses are typically written in lowercase. For instance, Filled.Count.Soap would technically match the pattern since letters are allowed and the pattern isn’t case-restricted unless a case-insensitive flag is used. In practice, what3words treats addresses in a case-insensitive manner, and the words are generally given in lowercase. So, while the pattern focuses on the structure (and will accept uppercase letters as valid characters), any matched address should be lowercased before making an API call or comparison to follow what3words standards.

In summary, this anchored RegEx ensures we have exactly three groups of characters (allowing for accents and letters from any language, and even spaces within those groups for certain languages) separated by two valid delimiters (like dots) and optional leading slashes. If a string matches this RegEx, it looks like a well-formed what3words address (but to know if it’s an actual valid address, you’d still need to check against the what3words API).

Adapting the pattern for free-text scenarios

The RegEx above works great when you are validating a standalone what3wordsaddress. But what if you want to find a what3words address buried in a larger string of text? For example, a user might type: “Please deliver to filled.count.soap by tomorrow.”

In such cases (common in chatbots, message parsing, or scanning free-form text), you need to adapt the RegEx pattern so it can match the address within a longer string, rather than the whole string. Here are the key adaptations for free-text use:

Remove the anchors: the ^ and $ anchors make the RegEx match only when the entire string is the address. To locate an address inside a sentence or paragraph, you’ll want to omit these anchors. This way, the RegEx can find a match starting and ending anywhere in the input text. In RegEx terms, you’re allowing the pattern to operate in “find a substring” mode instead of “match the whole string” mode.
Use lookarounds for word boundaries (optional but recommended): even without ^ and $ , the raw pattern can locate a what3words address in text, but it might also match things you don’t intend if they happen to fit the pattern by coincidence. To improve accuracy:

a) You can require a word boundary or whitespace before and after the address. For instance, you might prepend the pattern with(?<=\s|^) and append (?=\s|$) . These are lookbehind and lookahead assertions that ensure that immediately before the match is either the start of the string or a whitespace character, and immediately after the match is either end of string or whitespace. This prevents the RegEx from grabbing a sequence of words that are actually part of a larger word or URL.

b) In simpler terms, using lookarounds makes sure our address is a separate token in the text. For example, without lookarounds the pattern might inadvertently match the tail end of a longer string of letters or a URL. With lookarounds, "PleaseDeliverToindex.home.raftNow" would not yield a false match because there’s no word boundary before index . But "deliver to index.home.raft now" would match correctly, and we’d capture "index.home.raft" .

Be mindful of punctuation around the address: in normal writing, a what3words address might be followed by a full stop or comma (for example, at the end of a sentence: “Your location is filled.count.soap. Please stay there.”). If we remove the $ end anchor, our RegEx will happily match filled.count.soap from that string, ignoring the final full stop (since . at that position is not part of the allowed pattern for a word – it doesn’t have a following word, so the pattern naturally stops before it). That’s fine – it means the RegEx found the address and not the extra punctuation. Just be aware that punctuation directly adjacent to the address might not be captured, which is usually what you want. If you want to be extra careful not to accidentally include trailing punctuation in the match, you could refine the lookahead to (?!\p{L}) meaning “next character is not a letter”, which would naturally exclude punctuation without consuming it.
Global search: when searching in free text, make sure to use the RegEx in a global/find-all mode (depending on your programming language or RegEx engine). This will scan the whole input and return any and all matches (there could be more than one what3words address in a given text).

By making these adjustments, the RegEx becomes a powerful tool for scanning any piece of text and extracting potential what3words addresses. For instance, with anchors removed (and perhaps using the word-boundary lookarounds), the pattern would find the address in a sentence like “Meet me at table.lamp.spoon around 5pm” or even if someone used a different delimiter like “đi tới cửa hàng at ///món hầm.kem sữa.thơ ca ngay bây giờ” (mixing a Vietnamese address in a sentence). Note: when dealing with languages like Vietnamese in free text, detection can be tricky (as we’ll discuss next). In some cases, if you find the pattern is too permissive or you’re worried about false positives, another strategy for free-text is to use the what3words API’s AutoSuggest function after a format match. First use RegEx to find anything that looks like a what3words address, then call the API to check if it’s a real one. If you are in a conversational AI setting and parsing text inside a Large Language Model (LLM) workflow, the model itself can usually decide which tokens form a what3words address and which do not. In other words, you can use the RegEx purely as a cue for the LLM and let the model resolve edge-cases (such as trailing punctuation or surrounding Vietnamese words). If you still need a definitive confirmation that the string is a valid grid square, you can call the what3words API’s AutoSuggest or convert-to-coordinates endpoints after extraction—but that’s optional once the LLM has isolated the candidate address.

Vietnamese addresses with spaces

Vietnamese presents a unique challenge for what3words because: multi-syllabic words contain spaces. what3words has designed the Vietnamese word list so that it works with the way Vietnamese is normally written and typed:

Compound “words” with spaces: Many Vietnamese words are compounds of two or more syllables, written with spaces between them. For example, the Vietnamese word for “city” is “thành phố” – two syllables, with a space in between, even though it is parsed by Vietnamese speakers (and what3words!) as one word. In a what3words address, “thành phố” might appear as one of the three address words. To a Vietnamese speaker, it looks natural and readable with that space.

Importantly, users have flexibility in how they enter it: to accommodate different typing habits, what3words accepts Vietnamese address words either with spaces (exactly as displayed) or with all those internal spaces removed. So whether you write “thành phố” or “thànhphố”, it’s understood as the same word, as long as the other two words are formatted consistently with it. This ensures that if you are using a keyboard or input method that makes it tricky to add the space, you have an easier way to enter the address you need. (In practice, the addresses are consistently displayed with the proper spaces for clarity, but no special effort is needed on the user’s part to match that format when inputting).

In this case “thành phố” is the technically correct way of writing the word; it is therefore the “primary” word, and this is always what we display on our app and online map. Typing “ thànhphố . thànhphố . thànhphố ” into our search bar will take you to the location ///thành phố.thành phố.thành phố .

It should be noted that when Vietnamese what3words addresses appear in URLs, they appear without spaces (and with a special “alias” parameter as many non-Latin script languages do, see more here ).

Supported delimiters between word groups

We mentioned that what3words addresses aren’t always separated by the standard Latin script full stop . (U+002E) – when displayed on our online map, Japanese what3words addresses are separated by the ideographic full stop 。 (U+3002). The full-width full stop here is used to prevent any visual confusion of where the word boundaries are. Japanese is the only language that is not displayed using Latin script full stops.

Of course, what3words is available in many different writing scripts, many of which have a totally different set of punctuation to the Latin script. Due to its prevalence in URLs and email addresses, the Latin script full stop is often easily accessible on non-Latin script keyboards – but we want to make things easy for our users, and therefore we have allowed a range of different delimiters to be inputted by the user. These are not displayed within what3words addresses on our app or online map, but increase accessibility for global users. The RegEx character class [.｡。･・︒។։။۔።।] lists all the supported delimiters that can be inputted between the three words. These are essentially various forms of “period” or similar separators in different writing systems.

Here’s a table of all supported delimiter characters, along with their Unicode names and the languages/scripts that commonly use them:

Delimiter	Unicode Name	Used in Language/Script
.	FULL STOP (Period)	Default delimiter (Latin script languages like English, Spanish, etc.)
。	IDEOGRAPHIC FULL STOP	Japanese, Chinese (full-width “period” used in East Asian scripts)
・	KATAKANA MIDDLE DOT	Japanese (written in katakana or generally horizontal Japanese text)
｡	HALFWIDTH IDEOGRAPHIC FULL STOP	Japanese (half-width punctuation, sometimes used in Japanese digital text)
･	HALFWIDTH KATAKANA MIDDLE DOT	Japanese (half-width katakana contexts)
︒	PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP	Chinese/Japanese (vertical text layout)
។	KHMER SIGN KHAN	Khmer (Cambodian)
։	ARMENIAN FULL STOP	Armenian
။	MYANMAR SIGN SECTION	Burmese (Myanmar)
۔	ARABIC FULL STOP	Arabic, Urdu and other Arabic-script languages
።	ETHIOPIC FULL STOP	Amharic (Ethiopic script)
।	DEVANAGARI DANDA	Hindi, Marathi, and other Devanagari-script languages (used as a period)

Side note on repeated words

Whilst what3words addresses containing repeated words (e.g.///table.table.chair ,///table.table.table , or///table.chair.table ) would pass the Regex in all languages, it is worth explicitly clarifying that repeated words are indeed allowed — either twice or three times in a single address — in all languages. Format validators should not reject them.

Final thoughts

The what3words RegEx might seem daunting, but it’s designed to be comprehensive. It accounts for various languages, character sets, and even the unique challenges presented by languages like Vietnamese. For developers, understanding this pattern means you can confidently validate or find what3words addresses in text without immediately calling the API for every check.

For product and marketing teams, the key takeaway is that there’s a clear logic; each component of the RegEx ensures addresses are formatted correctly, which in turn means users get a smooth experience (with immediate feedback if something is typed wrong). By adapting the RegEx for your needs (exact match vs free-text search) and being mindful of internationalisation details (like the different delimiters and spacing rules), you can effectively integrate what3words address handling into your application. Hopefully, this breakdown makes the rule set clearer and takes away the mystery of RegEx.

Language Name

ISO code

what3words API language code

what3words Locale code

Script

Writing Direction

Default word delimiter

Does the language have secondary words

Secondary words notes

Does the language allow internal spaces

Internal Spaces notes

/// marker logical position

/// marker visual edge

Afrikaans

Latin

ltr

FALSE

prefix

left

Amharic

Ethiopic

ltr

FALSE

prefix

left

Arabic

rtl

FALSE

prefix

right

Bahasa Indonesia

Latin

ltr

FALSE

prefix

left

Bahasa Malaysia

Latin

ltr

FALSE

prefix

left

Bengali

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Bosnian

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Bosnian

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Bulgarian

Cyrillic

ltr

FALSE

prefix

left

Catalan

Latin

ltr

FALSE

prefix

left

Chinese

zh_si

Han (Simplified)

ltr

FALSE

prefix

left

Chinese

zh_tr

Han (Traditional)

ltr

FALSE

prefix

left

Croatian

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Croatian

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Czech

Latin

ltr

FALSE

prefix

left

Danish

Latin

ltr

TRUE

Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’

FALSE

prefix

left

Dutch

Latin

ltr

FALSE

prefix

left

English

Latin

ltr

FALSE

prefix

left

Estonian

Latin

ltr

FALSE

prefix

left

Finnish

Latin

ltr

FALSE

prefix

left

French

Latin

ltr

TRUE

Note: ‘œ’ may be typed as ‘oe’.

FALSE

prefix

left

German

Latin

ltr

TRUE

Note: ‘ä’ can also be inputted as ‘ae’; ‘ö’ can also be inputted as ‘oe’; ‘ü’ can also be inputted as ‘ue’

FALSE

prefix

left

Greek

ltr

FALSE

prefix

left

Gujarati

ltr

FALSE

prefix

left

Hebrew

rtl

FALSE

prefix

right

Hindi

Devanagari

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Hungarian

Latin

ltr

FALSE

prefix

left

isiXhosa

Latin

ltr

FALSE

prefix

left

isiZulu

Latin

ltr

FALSE

prefix

left

Italian

Latin

ltr

FALSE

prefix

left

Japanese

Hiragana

ltr

。

FALSE

prefix

left

Kannada

ltr

FALSE

prefix

left

Kazakh

kk_cy

Cyrillic

ltr

FALSE

prefix

left

Kazakh

kk_la

Latin

ltr

FALSE

prefix

left

Khmer

ltr

FALSE

prefix

left

Korean

Hangul

ltr

FALSE

prefix

left

Lao

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Malayalam

ltr

TRUE

Words that were changed in spelling reform have previous spellings as secondary words

FALSE

prefix

left

Marathi

Devanagari

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Mongolian

mn_cy

Cyrillic

ltr

FALSE

prefix

left

Mongolian

mn_la

Latin

ltr

TRUE

Secondary words are created when a Cyrillic character has more than one Latin script equivalent: ‘х’ can also be inputted as ‘h’ OR ‘kh’; ‘ө’ can also be inputted as ‘o’ OR ‘u’

FALSE

prefix

left

Montenegrin

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Montenegrin

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Nepali

Devanagari

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Norwegian

Latin

ltr

TRUE

Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’

FALSE

prefix

left

Odia

Oriya (Odia)

ltr

FALSE

prefix

left

Persian

Arabic

rtl

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

right

Polish

Latin

ltr

FALSE

prefix

left

Portuguese

Latin

ltr

FALSE

prefix

left

Punjabi

Gurmukhi

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Romanian

Latin

ltr

FALSE

prefix

left

Russian

Cyrillic

ltr

FALSE

prefix

left

Serbian

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Serbian

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Sinhala

ltr

FALSE

prefix

left

Slovak

Latin

ltr

FALSE

prefix

left

Slovene

Latin

ltr

FALSE

prefix

left

Spanish

Latin

ltr

FALSE

prefix

left

Swahili

Latin

ltr

FALSE

prefix

left

Swedish

Latin

ltr

FALSE

prefix

left

Tamil

ltr

FALSE

prefix

left

Telugu

ltr

FALSE

prefix

left

Thai

ltr

FALSE

prefix

left

Turkish

Latin

ltr

FALSE

prefix

left

Ukrainian

Cyrillic

ltr

FALSE

prefix

left

Urdu

Arabic

rtl

TRUE

Some pairs of characters share the same sound. Secondary words allow for this

FALSE

prefix

right

Vietnamese

Latin

ltr

TRUE

Primary words have spaces; secondary words are written with no spaces

TRUE

Vietnamese orthography allows up to three internal spaces inside a single dictionary word (e.g. ‘thành phố’). In a valid Vietnamese 3-word address this rule is all-or-nothing: if one word contains internal spaces, then all three words do.

prefix

left

Welsh

Latin

ltr

FALSE

prefix

left

Note: For an explanation of secondary words, see this blog post.