Last night I attended the #MidsTest test meetup in Coventry where Dr. Violet Kovacheva gave a talk on Address Matching.
Address matching sounds like something that should be simple. All we need is a house number and postcode that match some simple validation rules, and we have our unique address.
This is not the case. Not all addresses match a standard set of rules. Not all addresses are unique, not even within the city it presides in.
Even addresses that have the standard number and street name format aren’t guaranteed to adhere to the rules. Turns out there is a house in the UK with the number -1.
One possible solution is to use a UPRN (Unique Property Reference Number). This provides a unique identifier for the property.
There are a few issues with this. The numbers are allocated by a local government.
There are several different databases that contain the countries addresses. Unfortunately, none of these follow the same standards. Often columns within the databases do not align. The town in one database could appear in the same column as the City in another database.
Even if we decide use each database, a complicated sql query with lots of ‘or’ steps would be required to correctly merge the different databases together.
We could have the address appear in a single string so that all address sections appear together in order. However, we have no way of knowing which statement within the string contains which section of the address.
Several solutions have been proposed.
One is using LibPostal, which includes a library of street addresses. However, these are largely based on the US address structure so may not be suitable for non-US countries.
Parserator is a model that the can be trained. If you already have large amounts of data which can be used as training data, then this is a possible solution.
Partial Matching can be used using the Levenshtein distance. For this solution, 2 strings are compared and a ratio is provided using the Levenshtien distance.
For my 99 second talk, I talked about the benefits of sketch-notes and invited the audience to check out my sketch-notes from conferences I’d attended recently.
I find sketch-noting a great way to review my notes, pick out the key points and present them onto a single page of A4 paper. It makes it so much easier to share what I learnt with those who were unable to attend the talk.
As well an reviewing my notes, I like to add some color to the sketch-notes (and sometimes pictures). This encourages me to be creative with my sketch notes. With so much of our jobs focused on functional technology, I find that people rarely take the time to be creative.
Here is the complete sketch-note for the #MidsTest talk: