Consider the following questions:
How well is our name matching working?
Should I upgrade?
When I upgrade, how does it compare to my current version?
These questions and many others can only be answered by performing a proper name evaluation. An evaluation arms you with quantitative data you need to inform key stakeholders on the accuracy and performance of your name matching tool. Additionally, it measures how your name matching is behaving as new data sources enter your system and provides guidance on how to adapt the tool to changes.
The unique problems with name matching
Matching names is a challenging problem because they can differ in so many ways, from simple misspellings, to nicknames, truncations, variable spaces (Mary Ellen, Maryellen), spelling variations, and names written in different languages. Nicknames have a strong cultural component. The terminology itself is problematic because matching implies two things that are the same or equal, but name matching is more about how similar two entities are. Once you have a measure of similarity, you may need additional rules or human analysis to determine if it is a match. It is important to understand these challenges and to address them in your evaluation.
The importance of quantifiable, repeatable evaluation in your environment, on your data
Implementing a robust, formal methodology is the ideal way to reach accuracy and performance goals that satisfy stakeholders reliably. It’s important that the evaluation is performed on your data and in your environment. Generic test data may not have the same name variations and language distributions as your data. The only way to reliably test and improve performance is by using test data that is the same or similar enough to your production data. A formal approach provides consistency across evaluations and allows for small iterative exercises to be measured and to capture information to help improve the methodology.
Relationship between configuration & evaluation
Out of the box RNI comes equipped with its algorithms optimized and parameters set based on generalized name testing. While this is a great starting point, each customer's data and needs are different. One customer might value recall, another precision. As a result the default RNI settings might not be ideal for your environment. A primary goal of running an effective evaluation should be to identify an optimal configuration based on your data and business requirements.