04/08/2025

If you’ve ever needed to verify that a piece of text is in the format of a what3words address (three words separated by dots), you may have encountered the official what3words Regular Expression (RegEx).

RegEx patterns can look intimidating at first glance, so in this guide we’ll break down the what3words RegEx in plain English. We’ll also discuss how to adapt it for scanning free-form text (like chat messages), consider special cases like Vietnamese addresses that include spaces, and list the different punctuation marks that can separate the words.

The what3words RegEx pattern (exact match)

The what3words address format can be validated using a single-purpose RegEx pattern. This pattern is anchored to match the entire string (meaning it assumes the input is just the three word address with nothing extra). Here is the official RegEx for a full exact-match validation of a what3words address (including support for various languages and the optional /// prefix):

^\/{0,}(?:[^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+[.。。・・︒។։။۔።।][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+[.。。・・︒។։။۔።।][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+|[^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+(?:[\u0020\u00A0][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+){1,3}[.。。・・︒។։။۔።][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+(?:[\u0020\u00A0][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+){1,3}[.。。・・︒។։။۔።。][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+(?:[\u0020\u00A0][^0-9`~!@#$%^&*()+\-_=\[{\]}\\|'<,.>?/";:£§º©®\s]+){1,3})$

Let’s break down what this pattern is doing, step by step:

  • Anchors ( ^ and $ ): the ^ at the start and $ at the end of the RegEx ensure that the pattern matches the entire string from start to finish. In other words, the string should contain only a what3words address and nothing else. This is ideal for validating a standalone address field because it won’t allow any extra characters before or after the three words.
  • Optional prefix ( ^\/{0,} ): right after the ^ , the pattern allows \/{0,} – this means “zero or more forward slashes” . In practice, what3words addresses are most often presented with three leading slashes (like ///filled.count.soap ) as a visual indicator. The RegEx is allowing an optional /// at the beginning (technically it allows any number of slashes, but the intention is to cover the /// prefix or no slashes at all). This part is non-capturing (inside (?: ) ), and it does not require the slashes – so both "///index.home.raft" and "index.home.raft" would satisfy this portion.
  • Alternation for spacing styles: the pattern is split into two major alternatives separated by a| . This is to handle two scenarios:

a) No spaces within words: The first alternative matches the typical case where each of the three words contains no internal spaces (e.g. filled.count.soap )

b) Spaces within word groups: the second alternative handles languages like Vietnamese where what3words “words” can consist of multiple words separated by spaces (e.g.món hầm.kem sữa.thơ ca ). In this scenario, each of the three components may contain one or more spaces within it. Important: the RegEx is constructed such that either all three components have internal spaces or none do – it won’t allow mixing (you can’t have one part with a space and others without).

  • Word pattern (letters only): within each alternative, the pattern[^0-9\~!@#$%^&*()+-_= { }\|'<,.>?/";:£§º©®\s]+ appears in various forms. This cryptic character class is essentially saying “one or more characters that are not any of the following: digits, punctuation marks, or whitespace”. In simpler terms, it matches a sequence of letters (and letters with diacritic marks) from any language script. By excluding digits and symbols, it ensures that each “word” in the address is made up of alphabetic characters (letters including non-Latin characters and accents). For example, it would match “filled”, “écoute”, “محفظة”, “東京”, etc., but it would not match “hello!” (because “!” is disallowed) or “word123″` (digits disallowed) as part of a word.
  • Delimiters between words: the character class [.。。・・︒។։။۔።।] in the pattern represents the separator that must appear between the three word segments. This is the set of all supported delimiter characters that what3words recognises. It includes the standard Latin script full stop . as well as various Unicode punctuation used as equivalents to a dot in other languages (for example, the ideographic full stop used in Japanese, the Arabic full stop۔ , etc.) . Exactly two of these delimiters must appear – one between the first and second word, and one between the second and third word. This guarantees the format is “word1<delim>word2<delim>word3”. We will detail all the supported delimiters in a section below.
  • Note on trailing punctuation: the RegEx expects the address to end cleanly after the third word and delimiter — it does not allow an additional delimiter or symbol afterward. This means that filled.count.soap would match, but filled.count.soap. (with a trailing full stop) would not. However, in natural sentences, it’s perfectly normal to see a what3words address followed by a full stop at the end of a sentence — like “Your address is filled.count.soap.” — and that punctuation is not considered part of the address by the RegEx. If you’re scanning free-form text, the RegEx will still correctly extract the address without including the trailing punctuation (see section on free-text scenarios below).
  • Handling Vietnamese (or spaced) word groups: In the second alternative of the RegEx (after the| ), you’ll notice a construction like [^...]+(?:[\u0020\u00A0][^...]+){1,3} for each word group. This looks complex, but it’s basically an extension of the “letters only word” pattern to allow internal spaces:

a)[^...]+ matches a sequence of letters as before (the first part of the word group).

b)(?:[\u0020\u00A0][^...]+){1,3} then allows one to three repetitions of “a space followed by another sequence of letters”. Here \u0020 is the Unicode for a normal space " " and \u00A0 is a non-breaking space. This means each word can be made up of two to four separate syllables separated by spaces. For example, it could match "món hầm" as one word (letters + space + letters), or dụng cụ pha chế as one word.

c) By structuring the RegEx with a separate alternative, it ensures all three components use the same style. If the RegEx is matching the spaced what3words address version, then each of the first, second, and third words must contain at least one space. Conversely, if the “no internal spaces” version is used, none of the three can contain a space.

  • Case sensitivity: the RegEx shown does not explicitly enforce lowercase, even though what3words addresses are typically written in lowercase. For instance, Filled.Count.Soap would technically match the pattern since letters are allowed and the pattern isn’t case-restricted unless a case-insensitive flag is used. In practice, what3words treats addresses in a case-insensitive manner, and the words are generally given in lowercase. So, while the pattern focuses on the structure (and will accept uppercase letters as valid characters), any matched address should be lowercased before making an API call or comparison to follow what3words standards.

In summary, this anchored RegEx ensures we have exactly three groups of characters (allowing for accents and letters from any language, and even spaces within those groups for certain languages) separated by two valid delimiters (like dots) and optional leading slashes. If a string matches this RegEx, it looks like a well-formed what3words address (but to know if it’s an actual valid address, you’d still need to check against the what3words API).

Adapting the pattern for free-text scenarios

The RegEx above works great when you are validating a standalone what3wordsaddress. But what if you want to find a what3words address buried in a larger string of text? For example, a user might type: “Please deliver to filled.count.soap by tomorrow.”

In such cases (common in chatbots, message parsing, or scanning free-form text), you need to adapt the RegEx pattern so it can match the address within a longer string, rather than the whole string. Here are the key adaptations for free-text use:

  • Remove the anchors: the ^ and $ anchors make the RegEx match only when the entire string is the address. To locate an address inside a sentence or paragraph, you’ll want to omit these anchors. This way, the RegEx can find a match starting and ending anywhere in the input text. In RegEx terms, you’re allowing the pattern to operate in “find a substring” mode instead of “match the whole string” mode.
  • Use lookarounds for word boundaries (optional but recommended): even without ^ and $ , the raw pattern can locate a what3words address in text, but it might also match things you don’t intend if they happen to fit the pattern by coincidence. To improve accuracy:

a) You can require a word boundary or whitespace before and after the address. For instance, you might prepend the pattern with(?<=\s|^) and append (?=\s|$) . These are lookbehind and lookahead assertions that ensure that immediately before the match is either the start of the string or a whitespace character, and immediately after the match is either end of string or whitespace. This prevents the RegEx from grabbing a sequence of words that are actually part of a larger word or URL.

b) In simpler terms, using lookarounds makes sure our address is a separate token in the text. For example, without lookarounds the pattern might inadvertently match the tail end of a longer string of letters or a URL. With lookarounds, "PleaseDeliverToindex.home.raftNow" would not yield a false match because there’s no word boundary before index . But "deliver to index.home.raft now" would match correctly, and we’d capture "index.home.raft" .

  • Be mindful of punctuation around the address: in normal writing, a what3words address might be followed by a full stop or comma (for example, at the end of a sentence: “Your location is filled.count.soap. Please stay there.”). If we remove the $ end anchor, our RegEx will happily match filled.count.soap from that string, ignoring the final full stop (since . at that position is not part of the allowed pattern for a word – it doesn’t have a following word, so the pattern naturally stops before it). That’s fine – it means the RegEx found the address and not the extra punctuation. Just be aware that punctuation directly adjacent to the address might not be captured, which is usually what you want. If you want to be extra careful not to accidentally include trailing punctuation in the match, you could refine the lookahead to (?!\p{L}) meaning “next character is not a letter”, which would naturally exclude punctuation without consuming it.
  • Global search: when searching in free text, make sure to use the RegEx in a global/find-all mode (depending on your programming language or RegEx engine). This will scan the whole input and return any and all matches (there could be more than one what3words address in a given text).

By making these adjustments, the RegEx becomes a powerful tool for scanning any piece of text and extracting potential what3words addresses. For instance, with anchors removed (and perhaps using the word-boundary lookarounds), the pattern would find the address in a sentence like “Meet me at table.lamp.spoon around 5pm” or even if someone used a different delimiter like “đi tới cửa hàng at ///món hầm.kem sữa.thơ ca ngay bây giờ” (mixing a Vietnamese address in a sentence). Note: when dealing with languages like Vietnamese in free text, detection can be tricky (as we’ll discuss next). In some cases, if you find the pattern is too permissive or you’re worried about false positives, another strategy for free-text is to use the what3words API’s AutoSuggest function after a format match. First use RegEx to find anything that looks like a what3words address, then call the API to check if it’s a real one. If you are in a conversational AI setting and parsing text inside a Large Language Model (LLM) workflow, the model itself can usually decide which tokens form a what3words address and which do not. In other words, you can use the RegEx purely as a cue for the LLM and let the model resolve edge-cases (such as trailing punctuation or surrounding Vietnamese words). If you still need a definitive confirmation that the string is a valid grid square, you can call the what3words API’s AutoSuggest or convert-to-coordinates endpoints after extraction—but that’s optional once the LLM has isolated the candidate address.

Vietnamese addresses with spaces

Vietnamese presents a unique challenge for what3words because: multi-syllabic words contain spaces. what3words has designed the Vietnamese word list so that it works with the way Vietnamese is normally written and typed:

Compound “words” with spaces: Many Vietnamese words are compounds of two or more syllables, written with spaces between them. For example, the Vietnamese word for “city” is “thành phố” – two syllables, with a space in between, even though it is parsed by Vietnamese speakers (and what3words!) as one word. In a what3words address, “thành phố” might appear as one of the three address words. To a Vietnamese speaker, it looks natural and readable with that space.

Importantly, users have flexibility in how they enter it: to accommodate different typing habits, what3words accepts Vietnamese address words either with spaces (exactly as displayed) or with all those internal spaces removed. So whether you write “thành phố” or “thànhphố”, it’s understood as the same word, as long as the other two words are formatted consistently with it. This ensures that if you are using a keyboard or input method that makes it tricky to add the space, you have an easier way to enter the address you need. (In practice, the addresses are consistently displayed with the proper spaces for clarity, but no special effort is needed on the user’s part to match that format when inputting).

In this case “thành phố” is the technically correct way of writing the word; it is therefore the “primary” word, and this is always what we display on our app and online map. Typing “ thànhphố . thànhphố . thànhphố ” into our search bar will take you to the location ///thành phố.thành phố.thành phố .

It should be noted that when Vietnamese what3words addresses appear in URLs, they appear without spaces (and with a special “alias” parameter as many non-Latin script languages do, see more here ).

Supported delimiters between word groups

We mentioned that what3words addresses aren’t always separated by the standard Latin script full stop . (U+002E) – when displayed on our online map, Japanese what3words addresses are separated by the ideographic full stop (U+3002). The full-width full stop here is used to prevent any visual confusion of where the word boundaries are. Japanese is the only language that is not displayed using Latin script full stops.

Of course, what3words is available in many different writing scripts, many of which have a totally different set of punctuation to the Latin script. Due to its prevalence in URLs and email addresses, the Latin script full stop is often easily accessible on non-Latin script keyboards – but we want to make things easy for our users, and therefore we have allowed a range of different delimiters to be inputted by the user. These are not displayed within what3words addresses on our app or online map, but increase accessibility for global users. The RegEx character class [.。。・・︒។։။۔።।] lists all the supported delimiters that can be inputted between the three words. These are essentially various forms of “period” or similar separators in different writing systems.

Here’s a table of all supported delimiter characters, along with their Unicode names and the languages/scripts that commonly use them:

Delimiter Unicode Name Used in Language/Script
. FULL STOP (Period) Default delimiter (Latin script languages like English, Spanish, etc.)
IDEOGRAPHIC FULL STOP Japanese, Chinese (full-width “period” used in East Asian scripts)
KATAKANA MIDDLE DOT Japanese (written in katakana or generally horizontal Japanese text)
HALFWIDTH IDEOGRAPHIC FULL STOP Japanese (half-width punctuation, sometimes used in Japanese digital text)
HALFWIDTH KATAKANA MIDDLE DOT Japanese (half-width katakana contexts)
PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP Chinese/Japanese (vertical text layout)
KHMER SIGN KHAN Khmer (Cambodian)
։ ARMENIAN FULL STOP Armenian
MYANMAR SIGN SECTION Burmese (Myanmar)
۔ ARABIC FULL STOP Arabic, Urdu and other Arabic-script languages
ETHIOPIC FULL STOP Amharic (Ethiopic script)
DEVANAGARI DANDA Hindi, Marathi, and other Devanagari-script languages (used as a period)

Side note on repeated words

Whilst what3words addresses containing repeated words (e.g.///table.table.chair ,///table.table.table , or///table.chair.table ) would pass the Regex in all languages, it is worth explicitly clarifying that repeated words are indeed allowed — either twice or three times in a single address — in all languages. Format validators should not reject them.

Final thoughts

The what3words RegEx might seem daunting, but it’s designed to be comprehensive. It accounts for various languages, character sets, and even the unique challenges presented by languages like Vietnamese. For developers, understanding this pattern means you can confidently validate or find what3words addresses in text without immediately calling the API for every check.

For product and marketing teams, the key takeaway is that there’s a clear logic; each component of the RegEx ensures addresses are formatted correctly, which in turn means users get a smooth experience (with immediate feedback if something is typed wrong). By adapting the RegEx for your needs (exact match vs free-text search) and being mindful of internationalisation details (like the different delimiters and spacing rules), you can effectively integrate what3words address handling into your application. Hopefully, this breakdown makes the rule set clearer and takes away the mystery of RegEx.

Language Name ISO code what3words API language code what3words Locale code Script Writing Direction Default word delimiter Does the language have secondary words Secondary words notes Does the language allow internal spaces Internal Spaces notes /// marker logical position /// marker visual edge
Afrikaans af af Latin ltr . FALSE FALSE prefix left
Amharic am am Ethiopic ltr . FALSE FALSE prefix left
Arabic ar ar Arabic rtl . FALSE FALSE prefix right
Bahasa Indonesia id id Latin ltr . FALSE FALSE prefix left
Bahasa Malaysia ms ms Latin ltr . FALSE FALSE prefix left
Bengali bn bn Bengali ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Bosnian bs oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Bosnian bs oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Bulgarian bg bg Cyrillic ltr . FALSE FALSE prefix left
Catalan ca ca Latin ltr . FALSE FALSE prefix left
Chinese zh zh zh_si Han (Simplified) ltr . FALSE FALSE prefix left
Chinese zh zh zh_tr Han (Traditional) ltr . FALSE FALSE prefix left
Croatian hr oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Croatian hr oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Czech cs cs Latin ltr . FALSE FALSE prefix left
Danish da da Latin ltr . TRUE Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’ FALSE prefix left
Dutch nl nl Latin ltr . FALSE FALSE prefix left
English en en Latin ltr . FALSE FALSE prefix left
Estonian et et Latin ltr . FALSE FALSE prefix left
Finnish fi fi Latin ltr . FALSE FALSE prefix left
French fr fr Latin ltr . TRUE Note: ‘œ’ may be typed as ‘oe’. FALSE prefix left
German de de Latin ltr . TRUE Note: ‘ä’ can also be inputted as ‘ae’; ‘ö’ can also be inputted as ‘oe’; ‘ü’ can also be inputted as ‘ue’ FALSE prefix left
Greek el el Greek ltr . FALSE FALSE prefix left
Gujarati gu gu Gujarati ltr . FALSE FALSE prefix left
Hebrew he he Hebrew rtl . FALSE FALSE prefix right
Hindi hi hi Devanagari ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Hungarian hu hu Latin ltr . FALSE FALSE prefix left
isiXhosa xh xh Latin ltr . FALSE FALSE prefix left
isiZulu zu zu Latin ltr . FALSE FALSE prefix left
Italian it it Latin ltr . FALSE FALSE prefix left
Japanese ja ja Hiragana ltr FALSE FALSE prefix left
Kannada kn kn Kannada ltr . FALSE FALSE prefix left
Kazakh kk kk kk_cy Cyrillic ltr . FALSE FALSE prefix left
Kazakh kk kk kk_la Latin ltr . FALSE FALSE prefix left
Khmer km km Khmer ltr . FALSE FALSE prefix left
Korean ko ko Hangul ltr . FALSE FALSE prefix left
Lao lo lo Lao ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Malayalam ml ml Malayalam ltr . TRUE Words that were changed in spelling reform have previous spellings as secondary words FALSE prefix left
Marathi mr mr Devanagari ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Mongolian mn mn mn_cy Cyrillic ltr . FALSE FALSE prefix left
Mongolian mn mn mn_la Latin ltr . TRUE Secondary words are created when a Cyrillic character has more than one Latin script equivalent: ‘х’ can also be inputted as ‘h’ OR ‘kh’; ‘ө’ can also be inputted as ‘o’ OR ‘u’ FALSE prefix left
Montenegrin me oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Montenegrin me oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Nepali ne ne Devanagari ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Norwegian no no Latin ltr . TRUE Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’ FALSE prefix left
Odia or or Oriya (Odia) ltr . FALSE FALSE prefix left
Persian fa fa Arabic rtl . TRUE Some characters that look identical can be typed in more than one way FALSE prefix right
Polish pl pl Latin ltr . FALSE FALSE prefix left
Portuguese pt pt Latin ltr . FALSE FALSE prefix left
Punjabi pa pa Gurmukhi ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Romanian ro ro Latin ltr . FALSE FALSE prefix left
Russian ru ru Cyrillic ltr . FALSE FALSE prefix left
Serbian sr oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Serbian sr oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Sinhala si si Sinhala ltr . FALSE FALSE prefix left
Slovak sk sk Latin ltr . FALSE FALSE prefix left
Slovene sl sl Latin ltr . FALSE FALSE prefix left
Spanish es es Latin ltr . FALSE FALSE prefix left
Swahili sw sw Latin ltr . FALSE FALSE prefix left
Swedish sv sv Latin ltr . FALSE FALSE prefix left
Tamil ta ta Tamil ltr . FALSE FALSE prefix left
Telugu te te Telugu ltr . FALSE FALSE prefix left
Thai th th Thai ltr . FALSE FALSE prefix left
Turkish tr tr Latin ltr . FALSE FALSE prefix left
Ukrainian uk uk Cyrillic ltr . FALSE FALSE prefix left
Urdu ur ur Arabic rtl . TRUE Some pairs of characters share the same sound. Secondary words allow for this FALSE prefix right
Vietnamese vi vi Latin ltr . TRUE Primary words have spaces; secondary words are written with no spaces TRUE Vietnamese orthography allows up to three internal spaces inside a single dictionary word (e.g. ‘thành phố’). In a valid Vietnamese 3-word address this rule is all-or-nothing: if one word contains internal spaces, then all three words do. prefix left
Welsh cy cy Latin ltr . FALSE FALSE prefix left

Note: For an explanation of secondary words, see this blog post.