A key feature of RNI is that it returns a normalized score between 0 and 1 indicating how similar two names are. This makes integrating RNI into existing workflows powerful. However, it leads to the inevitable question: “What threshold should I use?” Without quantitative error analysis, this question is difficult to answer. RMS determines the correct value by processing your evaluation data against various thresholds, calculating precision, recall, and F1 for each data type.
The threshold report graphs the precision, recall, and F1 for each threshold value and each data type. As you can see from the graph, the threshold can be adjusted to favor precision or recall. With a high threshold, false positives will be less likely, leading to a higher precision. With a low threshold, false negatives will be less likely, leading to a higher recall. For applications involving border checking, you may be inclined to favor recall to avoid allowing potential threats into a country. Conversely, for Know Your Customer(KYC) applications, leaning your threshold towards precision may be best to reduce false positives. Your choice of threshold should ultimately be based on an analysis of accuracy as a function of threshold for your particular data, as well as your business requirements.
The threshold report page contains a table listing the Precision, Recall, and F1 values for each data type, along with the threshold value, along with a threshold graph. The threshold is the value which produces the best F1 score.
Figure 1. Threshold Report Graph
From this page, you can download one of the following reports:
Threshold summary, listing the TP, TN, FP, FN, precision, recall, and F-scores for each threshold and entity type.
Detailed match results, listing the RNI score for each pair, along with the result (TP, TN, FP, or FN) for each threshold value.
The files are