Specifying Phone Number Patterns
You can define complex patterns using digits and wildcards to collapse thousands of phone numbers into a manageable amount, reducing the number of normalizer parameter values you need to define.
A pattern comprises a segment list and wildcard characters, which are optional. A segment list is a list of one or more choices, ranges, and phone number values. Valid phone number values for a pattern are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, A, B, C, D, E, #, and *. When specifying a list of digits, all must match the corresponding part of the input phone number exactly for a match to occur. Note that, because the * character is also used as a wild card in pattern matching strings, if the string to match includes the * character, it must be escaped as follows: '\*.
Character | Description |
---|---|
[ ] | Encloses a set of characters to create a digit list, a choice set, or a range. Note that when only one set of digits is given, the enclosed set of literal digits is equal to the same set of literal digits without the brackets. For example, [123] is equal to 123. The brackets in such cases are optional. |
[ | ] | Creates a choice set where the element on either side of the choice separator ( | ) can equal a match. For example, [5|8|9] matches 5, 8, or 9. A choice set can contain one or more literal digit strings and ranges, where a match on any choice is equivalent to a match of a choice set as a whole. Although defined as a set rather than a list — meaning they are not expected to contain duplicates—any internal overlap between digits or ranges are ignored. |
[ - ] | Creates a finite set of digit lists defined by an upper and lower bound. The numeric value on the left of the range separator ( - ) is the start of the range and the numeric value on the right is the end of the range. Ranges are inclusive of their bound values, meaning range [07-12] is equivalent to the set [07|08|09|10|11|12]. The order of bound values is unimportant, for example, [0-2] and [2-0] are both valid and equivalent. All digit lists within a range must be of equal length. For example, to specify a range between 1 and 100, you must use [001-100]. A difference in length between the starting digit and the ending digit will cause pricing compilation to fail. |
? | A single-digit wildcard character that matches any single digit in its location. For example, 4? is equivalent to 4[0-9]. Single digit wildcards cannot be used inside ranges or choice sets and they do not change the priority of a pattern. The priority is handled by the order of the patterns in the normalizer instance. |
* | A multi-digit wildcard character that matches zero or more digits at the end of a segment or digit list to represent an infinite set, in which all digit permutations of all lengths are represented. You can only use an asterisk wildcard at the end of pattern, so it defines a prefix match. |
With these characters, you can create a phone number pattern that includes a list of one or more choices, ranges, and actual digits. For a telephone number to be considered a match, all segments in the segment list must match corresponding digits in the input phone number — in the same sequence — and any remaining trailing digits are to be considered a match or mismatch according to any wildcards specified for the pattern. An empty segment list only matches the empty digit list. If no choices, ranges, or wildcards are specified, all digits in the list must be used to consider the pattern to be a match. Note that actual digits have a higher precedence than an multi-digit wildcard and all matches start from the beginning of the number. When several different patterns equal a match, the pattern listed earliest in the normalizer is the one that is used. However, if more than one pattern is a match, the longest matching pattern is used. For example, say you have defined two patterns, 1234* and 12345*. If the input phone number is 123456, pattern 2 is used. However, say the second pattern does not include the wildcard, making it 12345. With the same input phone number of 123456, pattern 1 is now used because pattern 2 is no longer an exact match. This is because the pattern algorithm expects precisely 12345 and the input number contains one extra digit.
- Input 123 matches 123*
- Input 1234 matches 1234*
- Input 12345 matches 1234*
- Input 12356 matches 123*
The last input string does not equal 1234* because of the mismatch of the 5, which does not equal 4.
To determine a match, the normalizer algorithm steps through a tree structure to determine if an input phone number has a corresponding pattern match. This search starts from the tree's root-the beginning of the phone number-and continues along branches of the tree for each matching digit or single-digit wildcard character until it reaches either the phone number's end or encounters a digit for which it can find no corresponding branch (a mismatched digit). For the phone number's end, the algorithm checks for a matching, completed pattern. For a mismatched digit, the algorithm steps back to the most recent multi-digit wildcard and thus the longest matching prefix pattern. If neither completed pattern nor multi-digit wildcard is found, then the normalizer's default result (if any) is used.
Ambiguity Resolution
Phone number patterns in the normalizer can be reordered using the drag-and-drop method. Where ambiguity exists between two or more patterns and they are of the same length (excluding any multi-digit wildcards), then the pattern defined highest in the list will be chosen. For example, given patterns:
- 123[4-8]
- 123[3|4|5]
- 1234
All three patterns match an input phone number of 1234. In this case, even though pattern 3 appears to be the most precise, pattern 1 is chosen as the best match because it is first in the list.
Pattern Type Examples
Pattern | Matches |
---|---|
[5|7]11[0|9] |
|
80[0|1|2|3] |
|
Pattern | Matches |
---|---|
[1-4]5678 |
|
35[66-70]8 |
|
Pattern | Matches |
---|---|
2?0 |
|
44* | 44, 440, 4401, 44021, 4411, 4412112345566... The wildcard represents all digit permutations of 0-9, of any length, including zero-length. Use it only at the end of a digit segment to decide a phone number prefix. |
Pattern | Matches |
---|---|
13[4|6][7-9|0] |
|
4?[2|5] |
|
Country Code Prefix
- United Kingdom — 44*
- France — 33*
- Switzerland — 41*
Corporate Number Blocks
- Subsidiary 1 — 408522[1000-1499]
- Subsidiary 2 — 408522[1500-1999]
- Subsidiary 3 — 408522[2000-2500]