This section briefly describes common problems we encounter with our customers and offer a quick explanation and guidance on how to address them.
Question: Why is a name I expect to be in my hit set missing?
Explanation: A recall issue is when an expected name is missing from the result and occurs when the name is not found during the first pass of the query. This means the windowSize is too small or the windowSize filled up, which can happen for very popular names like “Smith”. Try the following steps:
Confirm the first pass query finds your name/record by executing just the first pass.
Expand the windowSize until you see your name/record
Question: Do I need to index all my data? How do I keep my index up to date?
Explanation: No, you only need to index data that you will be using to match records. In many applications/systems you have a primary data store that stores all information on a given record. You do not need to duplicate that data in Elastic. Focus on identifying key fields that help determine record matching. As your primary data store is updated, you can update those same records in Elastic. You do not need to do a full data update after each change in your data.
Question: I have first name, last name, and middle name in separate fields, do I need to use full name? Why?
Explanation: RNI matching works best on full names. Using partial names will likely increase the number of false positives and lower your accuracy. You can modify the token weights to better represent the impact of each field to get the desired results.
Question: What should I do if I don’t know the entity type?
Explanation: It is highly recommended that entity type and language are set when indexing and querying RNI names. In some cases entity type might not be known. You can leave entity type blank or use the NONE type. This will make matching more generic and will impact your results. For example, gender mismatch will no longer apply because the entity is not a PERSON. As a result you might need to lower your match threshold to avoid increasing false negatives as scores may be lower overall as a result.
Question: Why am I getting an unsupported language error when I submit a query?
Explanation: RNI supports 18 different languages. When indexing or querying names that RNI does not support, an error message will be returned and the record will not be indexed or the query will not return a hit. To better handle these cases you might consider using Rosette’s language identifier to determine the language of the name. If the language is not supported by RNI you can send the index/query request to a separate queue to be handled separately.
Question: I have two similar names like “Jay” and “Jude”, but RNI doesnt match them, why is that?
Explanation: RNI matches phonetics and generally doesn't calculate edit distance. “Jay” and “Jude” will generate different metaphones so they won't match in the first pass, then likely the HMM won't score them so highly because the training pairs won't have had matches that are similar. Two names looking somewhat similar doesn't mean they'll score highly in RNI.
Question: I have my own entity scoring algorithm, can I use RNI to generate all the name variations for me then feed those names into my system?
Explanation: This is a classic search anti pattern. You will never be able to generate a list long enough to get the proper recall taking that approach and it would come at a significant performance cost. You accomplish the same outcome by leveraging RNI’s high recall keys and rescoring algorithm that can be built to include additional entity fields to calculate a similarity score.
Question: I have two names that I think are a pretty strong match, why is the RNI score zero?
Explanation: When a name is found as a result of the initial query (first pass) but the windowSize is set to low, names that fall outside the windowSize and fail to make it to the rescore query(2nd pass) get a score of 0.0.
Question: I can configure RNI to successfully match non obvious organization nick names like “Six flags” and “6 flags” or “Forever 21” and “Forever Twenty One”?
Explanation: Yes you can create organization token overrides or equivalence classes to handle these types of cases.
Question: I have a large organization and we have multiple and different uses of RNI, do I need a separate installation to handle each?
Explanation: RNI supports the ability to have individual named parameter universes for each scenario. This allows customers to configure RNI specially for their needs. During runtime you can specify which parameter universe RNI should use.