List of strategies
See the overview for an explanation of what strategies are and definitions for some of the terms used below.
Strategy 1
Strategy 2
Strategy 2 removes the exact searches from Strategy 1, and uses a large variation of fuzzy and non-fuzzy searches. The order of the queries is not optimized.
- Non Fuzzy GFD
- Non Fuzzy GFD Range
- Non Fuzzy All (Postcode Wildcard)
- Non Fuzzy All
- Fuzzy GFD
- Fuzzy GFD Range (Postcode Wildcard)
- Fuzzy GFD Range (Postcode)
- Fuzzy All
- Fuzzy Alt DOB
Strategy 3
Strategy 3 went through many iterations during testing, and the three most recent versions are documented below. For more details, please see the full version history here.
Version 14
- Non-Fuzzy GFD
- Fuzzy GFD
- Fuzzy All
- Non-Fuzzy GFD Range
- Non-Fuzzy GFD Range (Postcode)
- Fuzzy GFD Range
- Fuzzy GFD Range (Postcode)
Version 15
Adds postcode wildcard rules to the end of the search.
- Non-Fuzzy GFD
- Fuzzy GFD
- Fuzzy All
- Non-Fuzzy GFD Range
- Non-Fuzzy GFD Range (Postcode)
- Fuzzy GFD Range
- Fuzzy GFD Range (Postcode)
- Non-Fuzzy GFD Range (Postcode Wildcard)
- Fuzzy GFD Range (Postcode Wildcard)
Version 16
Adds a postcode field to the initial rule of V15.
- Non-Fuzzy GFD (Postcode)
- Fuzzy GFD
- Fuzzy All
- Non-Fuzzy GFD Range
- Non-Fuzzy GFD Range (Postcode)
- Fuzzy GFD Range
- Fuzzy GFD Range (Postcode)
- Non-Fuzzy GFD Range (Postcode Wildcard)
- Fuzzy GFD Range (Postcode Wildcard)
Strategy 4
Strategy 4 takes version 14 of Strategy 3, splits the given name into an array, and passes them to PDS as multiple given names.
Version 1
- Non Fuzzy GFD
- Fuzzy GFD
- Fuzzy All
- Non Fuzzy GFD Range
- Non Fuzzy GFD Range (Postcode)
- Fuzzy GFD Range
- Fuzzy GFD Range (Postcode)
Version 2
Strategy 5
Strategy 5 builds on Version 2 of Strategy 4 by adding two new queries that omit the GivenName field.
Why this version exists
Initial analysis suggested that the GivenName field sometimes contains "noisy" data, such as:
- Multiple names or middle names merged into one field.
- Special characters or formatting errors.
- Temporary placeholders (e.g., "baby" or "infant") used before a formal name is registered.
The theory behind this strategy is that by omitting the GivenName in specific scenarios, the Personal Demographics Service (PDS) can better identify an individual using its internal name history and other demographic anchors, rather than being blocked by a non-matching first-name string.
What we did to get here
We started with Strategy 4 Version 2, as it was the most effective version at the time. We then inserted two new queries (steps 3 and 4 below) after our most effective initial rules. This ensures that we first attempt to match using the most accurate data (including first name) to minimize risk, before falling back to the GivenName-omitted queries for the remaining unmatched records. Note that the queries in this strategy were run on all records and were not limited to only those that had quality issues.
What we found as a result of the changes
When running this strategy against our dataset, we found that while these new queries had only a small effect on the specific records they were designed to target, they had an unexpectedly larger impact on overall results, improving the match rate by several percentage points and elevating some previously non-confident matches to confident ones.