Version 1.3.0
Welcome to Rosette Match Studio (RMS), an interactive tool for evaluating and configuring Rosette Name Indexer (RNI) for name matching. Rosette Match Studio uses RNI for fuzzy name retrieval and name matching, while storing the names and search keys in the Elasticsearch full-text search engine.
Rosette Match Studio includes the following options:
Upload Recordset Create the search index, a list of names of people, organizations, and/or locations.
Search Perform a single search, returning matches from the index.
Batch Search Perform multiple searches, using each record in the file as the search term against the index.
Advanced View and modify search settings and field data types.
Compare Display the details of a pairwise match (a match between two names), including the algorithms used to calculate the match scores. Modify the values of match parameters and see the impact on the match score. Use these values to optimize the RNI parameters for your data and use case.
Help Displays this help file.
Your business determines your specific use case and priorities. Search can be optimized for your use case by managing the trade-offs between accuracy and speed, as well as precision (percentage of returned results that are relevant) and recall (percentage of relevant results returned). Optimizing for recall can increase false positives; optimizing for precision can increase false negatives (missed matches).
Required resources will depend on the size of your recordset, the required throughput, and your target accuracy levels.
Matching is the process of determining whether identifying information about an individual, such as their name, company, address, and/or age, matches a record in the index. With Rosette Match Studio, you can enter one or more pieces of identifying information and it will return a list of potential matches from your loaded index. Each match will have a score, between 0 and 100%, indicating the match strength.
Name matching is the core of multi-field entity matching. Names are complex to match because of the large number of variations that occur within a language and across languages. These include, but are not limited to, typographical errors, phonetic spelling variations, transliteration differences, initials, and nicknames.
Rosette Match Studio also matches other Field Types, such as organization name, location name, and date.
You can investigate why particular fields matched and how scores were calculated using the compare functionality. You can change how the scores are calculated by modifying the match parameters in real time, to better understand the process and tune it for your specific application.
Rosette Match Studio can identify name similarities between English and all supported languages. Cross-language matches are supported between Chinese, Japanese, and Korean. Matching names in the same language is supported for all languages.
Table 1. Supported Languages
Arabic (ara ) |
Greek (ell ) |
Portuguese (por ) |
Chinese, Simplified (zho ) |
Hungarian (hun ) |
Pashto (pus ) |
Chinese, Traditional (zho ) |
Italian (ita ) |
Russian (rus ) |
English (eng ) |
Japanese (jpn ) |
Spanish (spa ) |
French (fra ) |
Korean (kor ) |
Thai (tha ) |
German (deu ) |
Persian(fas ): Dari(prs ) and Western Farsi(pes ) |
Urdu (urd ) |
Examples of Supported Matches
English - English (same language)
Chinese - Korean (cross language match for Chinese, Japanese, and Korean)
Hungarian - English (English - language match)
Hungarian - Hungarian (same language)
Types of Token and Name Matches