RNI is tuned to perform well in a variety of name matching scenarios, but it is not optimized for all data and match requirements. There are a number of parameters you can tune to modify the matching algorithm, improving the results for your use case.
There are two .yaml files located in plugins/rni/bt_root /rlpnc/data/etc
to guide you in configuring the match parameters, parameter_defs.yaml
and parameter_profiles.yaml
.
parameter_defs.yaml
lists each match parameter along with the default value and a short description. Each parameter may also have a minimum and maximum value, which is the system limit and could cause an error if exceeded. A parameter may also have a recommended minimum (sane_minimum
) and recommended maximum (sane_maximum
) value, which we advise you do not exceed.
parameter_profiles.yaml
is where you change parameter values based on the language pairs in the match.
The parameters in the parameter_profiles.yaml
file are organized by parameter profiles. Each profile contains paramater values for a specific language pair. For example, matching "Susie Johnson" and "Susanne Johnson" will use the eng_eng
profile. There is also an any
profile which applies to all language pairs.
Parameter profiles have the following characteristics:
-
Parameter profile names are formed from the language pairs they apply to. The 3 letter language codes are always written in alphabetical order, except for English (eng), which always comes last. The two languages can be the same. Examples:
They can include the entity type being matched, such as eng_eng_PERSON
. The parameter values in this profile will only be used when matching English names with English names, where the entity type is specified as PERSON.
Parameter profiles can inherit mappings from other parameter profiles. The global any
profile applies to all languages; all profiles inherit its values.
The any
profile can include an entity type. any_PERSON
applies to all PERSON matches regardless of language.
Specific language profiles inherit values from global profiles. The profile matching person names is named any_PERSON
. The profile for matching Spanish person against English person names is named spa_eng_PERSON
and it inherits parameter values from first the spa_eng
profile and then the any_PERSON
profile.
Important
Do not modify the parameter_defs.yaml
file. All changes should be made in the parameter_profiles.yaml
file. Global changes are made with the any
profile.
A parameter universe is a named profile containing a set of RNI parameter profiles with values. Each universe has a name and can contain multiple parameter profiles, including the global any
profile. A parameter universe profile can also include the entity type being matched, just like regular parameter profiles. Examples:
For example, the MyParameterUniverse universe may include the following parameter profiles:
"name": "MyParameterUniverse/any"
applies to all language pairs.
"name": "MyParameterUniverse/spa_eng"
applies to English - Spanish name pairs.
"name": "MyParameterUniverse/spa_eng_PERSON"
applies to all PERSON English - Spanish name pairs.
Each parameter in the profile must match the name of a parameter declared in the parameters_defs.yaml
file, along with a value.
A parameter universe can be defined dynamically or added to the parameter_profiles.yaml
file. We recommend that you use dynamic parameter universes for testing and tuning only. For production use, add all parameter universes to the parameter_profiles.yaml
file.
Tip
You can define multiple named parameter profiles.
Define the parameter universe in the parameter_profiles.yaml
file. Example:
parameterUniverseOne/spa_eng_PERSON:
reorderPenalty: 0.4
HMMUsageThreshold: 0.8
stringDistanceThreshold: 0.1
useEditDistanceTokenScorer: true
parameterUniverseOne/eng_eng:
reorderPenalty: 0.6
Using a Parameter Universe
To use a parameter universe, add it as part of the name_score
function when querying the index. All parameter values defined in the parameter universe will be used, where appropriate.
curl -XPOST "http://localhost:9200/_search" -H 'Content-Type: application/json' -d'{
"query": {
"match": {
"full_name": "A Ely Taylor"
}
},
"rescore": {
"window_size": 3,
"rni_query": {
"rescore_query": {
"rni_function_score": {
"name_score": {
"field": "full_name",
"query_name": "A Ely Taylor",
"score_to_rescore_restriction": 1,
"window_size_allowance": 0.5,
"universe": "parameterUniverseOne"
}
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}'
curl -XPOST "http://localhost:9200/_search" -H 'Content-Type: application/json' -d'{
"query": {
"match": {
"full_name": "A Ely Taylor"
}
},
"rescore": {
"window_size": 3,
"query": {
"rescore_query": {
"function_score": {
"name_score": {
"field": "full_name",
"query_name": "A Ely Taylor",
"score_to_rescore_restriction": 1,
"window_size_allowance": 0.5,
"universe": "parameterUniverseOne"
}
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}'
Dynamic Parameter Universes
When tuning RNI, you can use the Parameters REST API endpoint to dynamically create or update a parameter universe, overriding the existing parameter values without having to restart Elasticsearch. Once the optimum values are determined for each parameter, add the parameter universe to the parameter_profiles.yaml
file for production use.
Tip
Dynamic parameter universes are best suited for testing and tuning the RNI match parameters. Once you determine the best set of parameters, add the parameter universe to the parameter_profiles.yaml
file for production use. Using dynamic parameter universes can slow your system down considerably.
Use the Parameters endpoint to create a parameter universe, with parameters and values.
curl -XPOST "http://localhost:9200/rni_plugin/_parameter_universe" -H 'Content-Type: application/json' -d'{
"profiles": [
{
"name": "parameterUniverseOne/spa_eng_PERSON",
"parameters": {
"reorderPenalty": 0.4,
"HMMUsageThreshold": 0.8,
"stringDistanceThreshold": 0.1,
"useEditDistanceTokenScorer": true
}
}
]
}'
The name of the parameter universe is parameterUniverseOne and it applies to matching person names between Spanish and English.