As the rate of data creation increases each year, the importance of effective search tools increases. Names continue to be one of the key identifiers for individuals and organizations. Especially when record identifiers (like national ID numbers or customer IDs) are not available, names are often thought of as the next best data signal to match on. Names are vitally important data points in financial compliance, anti-fraud, government intelligence, law enforcement, and identity verification. Yet matching names can be challenging when your data includes anomalies such as misspellings, aliases or nicknames, initials, and non-Latin scripts -- which it invariably will.
Rosette addresses these issues with a linguistic, A.I.-based system that compares and matches the names of people, places, organizations, as well as addresses, dates and other identifiers despite the many variations each of these are likely to have. Built by linguistics experts, our match technology is unrivaled in its ability to connect entities with high adaptability, precision, and scalability. With fluency across 18 languages and a deep understanding of the linguistic complexities of names, Rosette is the first choice for record matching when names are a key signal.
Name matching is a fairly simple concept to understand, but it can be challenging to evaluate and optimize for specific needs. As leaders in name matching, with experience working side by side with customers, we have created this guide to help Rosette adopters maximize its potential and quickly deliver value. The purpose of this document is to provide a deeper and practical understanding of RNI, present best practices for conducting name evaluations and testing, and educate users on how to tune RNI to improve accuracy and performance.
This guide assumes you are familiar with the basics of Rosette Name Indexer (RNI) as this guide is mostly focused on using the RNI Elasticsearch plugin for evaluation and configuration. You should be comfortable indexing and querying records with Elasticsearch or making calls to the Rosette name similarity API. One goal of this document is to provide a blueprint for conducting effective name evaluations to implement the optimal configuration to support your business requirements. This guide in its entirety acts as a playbook to accomplish this task while providing deeper explanations into the inner workings of RNI. By the end of this guide you should be able to execute the task and methodologies discussed in this document with your own team of engineers and data scientists.