The Evaluate task allows you to see how Match Studio's results for a given index change with different match configurations and thresholds. It does this by comparing its own search results to user-provided gold data. You can think of gold data as the correct answers to a test that Match Studio is administering to itself. The gold data file consists of a list of name pairs and whether or not they should match. For the purposes of this tutorial we have provided you with a pre-made gold data file for use in your first evaluation.
-
Select the Evaluate tab from the navigation bar.
-
Select or drag Quick_Start_Guide_Gold_Data.csv
included in the file package into the import field.
Note
This file is located in Rosette_Match_Studio_Quick_Start_Guide_<version>.zip
.
If you do not have this file package, you can download it from support.rosette.com.
-
Select New Evaluation in the Options column for the uploaded gold data file.
-
Ensure RMS-<version> Default
is selected from the Match Configuration dropdown menu.
Note
This file is built into RMS to make it easy to run an evaluation, but you can create your own configurations too.
-
Select Start Evaluation.
When the Evaluation is complete, you will see a table showing the match configuration used to perform the evaluation, date of the evaluation, and the following additional information:
-
Threshold: The match threshold at which the rest of the data in the table is true. When Best Threshold is selected (next to Display), the threshold at which the match configuration performs best is automatically selected.
-
TPs: True positives. Number of matching name pairs that were labeled a match.
-
TNs: True negatives. Number of name pairs that did not match, that were not labeled a match.
-
FPs: False positives. Number of matching name pairs that were not labeled a match.
-
FNs: False negatives. Number of name pairs that did not match, that were labeled a match.
-
P: Precision. A number between 0 and 1 that indicates what proportion of the matches labeled by RMS were correct. A precision value of 1 means there were no false positives.
-
R: Recall. A number between 0 and 1 that indicates what proportion of matches in the gold data were identified as matches by RMS. A recall value of 1 means there were no false negatives.
-
F1: The harmonic mean of precision and recall. A higher F1 measure indicates better overall accuracy, taking into account both false positives and false negatives.
From this evaluation we can see that a threshold of 0.7 (70%) on this match configuration gives us very good precision, recall, and F1 measure.