A key feature of RNI is that it returns a normalized score between 0 and 1 indicating how similar two names are. This makes integrating RNI into existing workflows straight forward. However, it leads to the inevitable question: “What threshold should I use?” Without quantitative error analysis, this question is difficult to answer. Now that you have learned how to measure the accuracy of your evaluations, you can leverage additional automation to calculate an optimal threshold. By adding different threshold values to your automated evaluation tool, you can process your evaluation data against the various thresholds, calculating precision, recall, and F1 for each. The end result of this process is clear understanding of an optimal threshold.
As one can see from the following graph, the threshold can be adjusted to favor precision or recall. With a high threshold, false positives will be less likely, leading to a higher precision. With a low threshold, false negatives will be less likely, leading to a higher recall. For applications involving border checking you may be inclined to favor recall to avoid allowing potential threats into a country. Conversely, for Know Your Customer(KYC) applications, leaning your threshold towards precision may be best to reduce false positives. Your choice of threshold should ultimately be based on an analysis of accuracy as a function of threshold for your particular data, as well as your business requirements.