The plugin includes Elasticsearch REST APIs to customize and tune matching through stop words, token overrides, and parameter universes. These endpoints allow you to add and modify these configuration values, without having to restart Elasticsearch.
Tip
To use any of the configuration REST APIs, the parameter enableDynamicConfigurationEndpoints
must be set to true
in the parameter_profiles.yaml
file in the any:
profile. By default, this parameter is set to false
. These endpoints should be used for testing and tuning only. When the dynamic configuration endpoints are enabled, they can slow the system down considerably.
The RNI-ES plugin relies on having an active primary or replica shard for each dynamic index available on every node. As a result, it is up to users to ensure each node has enough disk space to not exceed Elasticsearch watermark thresholds. Outside of this, users should never interact directly with the underlying dynamic configuration indices; all requests should go through the appropriate /rni_plugin endpoints.
Request timeouts
Whenever the RNI-ES plugin detects that one of its underlying dynamic indices has changed, it must fetch the entire index contents before the next name matching or indexing request. The timeout threshold for this fetch request is managed individually for each class of endpoints, and defaults to 60,000 ms. If this value is found to be insufficient for any reason, users can configure it at plugin startup time with the bt.{override,stopword,parameter}.timeout
java property.
Tip
To use dynamic configuration endpoints in an elasticsearch deployment using SSL encryption, the RNI Elasticsearch plugin must be aware of the server's certificate file. To accomplish this, start elasticsearch with:
ES_JAVA_OPTS="-Dbt.ssl.certificate=<path_to_certificate>"
The _stopwords
endpoint allows you to ADD, GET and DELETE stop words without restarting the Elasticsearch server. See Stop Patterns and Stop Word Prefixes for more detailed information on stop words.
The following properties are used when creating stop words. The entity_type
is optional; all other fields are required when adding stop words through the API.
Table 9. Stop Word Properties
Property
|
Required
|
Description
|
lang
|
✓
|
ISO 639-3 code for the language of the stop word(s).
|
stopword_type
|
✓
|
Type of stop word(s), either regexes or prefixes
|
entity_type
|
|
Entity type for which to apply the stop word(s), defaults to ALL .
|
stop words
|
✓
|
List of stop words to be added.
|
Note
Stop words are applied whenever a token is normalized, meaning stop words will impact the names content that is included in the index. Therefore, changes to dynamic stop words do require data to be reindexed to take effect.
The stop word index must exist before you can start adding stop words. To create the index:
curl -s -XPOST "localhost:9200/rni_plugin/_stopwords/_create"
To refresh the stop word index:
curl -s -XPOST "localhost:9200/rni_plugin/_stopwords/_refresh"
The POST_stopword
adds one or more stop words. The entity_type
field is optional, but the other fields are all required.
curl -XPOST "http://localhost:9200/rni_plugin/_stopwords" -H 'Content-Type: application/json' -d '{
"lang": "eng",
"stopword_type": "prefixes",
"entity_type": "PERSON",
"stopwords": [
"honorable",
"senior correspondent"
]
}'
The GET _stopwords
method returns all stop words for a given language and stop word type. You can search by just language or by language and type.
When no entity type is specified, the stop word is applied to all names in the language, those with and without entity types. Therefore, calls that specify a type such as PERSON or ORGANIZATION will also return all stop words that don't have an entity type specified.
Returns all prefix stop words for PERSON types in English:
curl -XGET "http://localhost:9200/rni_plugin/_stopwords/prefixes_eng_PERSON"
Returns all regex stop words for ORGANIZATION types in Spanish:
curl -XGET "http://localhost:9200/rni_plugin/_stopwords/regexes_spa_ORGANIZATION"
Returns all prefix stop words in English with no type specified. By default, this list is empty; data will only be returned if you've populated the file with values:
curl -XGET "http://localhost:9200/rni_plugin/_stopwords/prefixes_eng"
The DELETE _stopwords
method deletes a specified stop word. Deleting a stop word from a specific profile will also delete it from the any
profile.
curl -XDELETE "http://localhost:9200/rni_plugin/_stopwords/prefixes_eng_PERSON/doctor"
The _overrides
endpoint allows you to ADD, GET and DELETE token pair overrides without restarting the Elasticsearch server. See Overriding Token Pair Matches for more detailed information on token pair overrides.
The following properties are used when creating token overrides.
Table 10. Token Overrides Properties
Property
|
Required
|
Description
|
lang1
|
✓
|
ISO 639-3 code for the language of the first name in the override pair.
|
lang2
|
✓
|
ISO 639-3 code for the language of the second name in the override pair.
|
entity_type
|
|
Entity type of the list of token override pairs, defaults to "ALL".
|
selector
|
|
An alphanumeric string which specifies the selector value to apply for these overrides.
|
token_pairs
|
✓
|
List of token override pairs to be added.
|
token1
|
✓
|
Tokens of the first name in the override pair; they should be of lang1
|
token2
|
✓
|
Token of the second name in the override pair; they should be of lang2 .
|
score
|
|
Raw score of the token pair between 0.0 and 1.0. If omitted, the value from the nicknameOverrideScore parameter is used.
|
force
|
|
Indicates whether to force this score to be exactly that value for the given token pair, defaults to false .
|
Note
RNI is designed so that override information is not included with indexed names. Therefore, changes to dynamic overrides do not require data to be reindexed to take effect.
The override index must exist before you can start adding token overrides. To create the index:
curl -s -XPOST "localhost:9200/rni_plugin/_overrides/_create"
To force a refresh of the dynamic override index:
curl -s -XPOST "localhost:9200/rni_plugin/_overrides/_refresh"
The POST _overrides
adds one or more token overrides. As shown in the table above, entity_type
, force
, and score
are optional, but the other fields are required.
curl -XPOST "http://localhost:9200/rni_plugin/_overrides" -H 'Content-Type: application/json' -d'{
"lang1": "eng",
"lang2": "eng",
"entity_type": "PERSON",
"token_pairs":
[{
"token1": "Abigail",
"token2": "Abbey",
"score": 0.74,
"force": true},
{
"token1": "Aleksander",
"token2": "Alex",
"score": 0.74},
{
"token1": "Alfonso",
"token2": "Alphonse",
"score": 0.74},
{
"token1": "Frederica",
"token2": "Federica",
"score": 0.74}]}'
The GET _overrides
method returns the overrides of a given language profile.
curl -XGET "http://localhost:9200/rni_plugin/_overrides/hun_eng_PERSON"
You can also retrieve the score of a given override pair.
curl -XGET "http://localhost:9200/rni_plugin/_overrides/hun_eng_PERSON?token1=abigel&token2=abigail"
The DELETE _overrides
method deletes a given override pair. Deleting an override from a specific profile will also delete it from the any
profile.
curl -XDELETE "http://localhost:9200/rni_plugin/_overrides/hun_eng_PERSON/abigel+abigail"
The _parameter_universe
endpoint allows you to ADD, GET and DELETE parameters through parameter universes, without restarting the Elasticsearch server. See Parameter Universe for more information on tuning parameters with parameter universes.
Note
While some parameters can impact the data that is included in the index, these parameters cannot be dynamically specified. Therefore, changes to dynamic parameters do not require data to be reindexed to take effect.
The parameter index must exist before you can start adding parameters. To create the index:
curl -s -XPOST "localhost:9200/rni_plugin/_parameter_universe/_create"
To force a refresh of the parameter universe index:
curl -s -XPOST "localhost:9200/rni_plugin/_parameter_universe/_refresh"
The POST _parameter_universe
method creates a parameter universe and the parameter profiles within the universe. Use this method to add or update a parameter value in a parameter universe. If you try to add a parameter universe that already exists, it overrides it with the new values. The parameter universe method uses the following syntax:
SomeParameterUniverseName/xxx_yyy
where xxx_yyy
is the language profile the parameters belong to, expressed in ISO 639-3 codes. The parameters
field expects a list of parameters for the given profile, where the naming of the parameters should match the ones declared in parameter_defs.yaml
.
curl -XPOST "http://localhost:9200/rni_plugin/_parameter_universe" -H 'Content-Type: application/json' -d'
{
"profiles": [
{
"name": "SomeParameterUniverseName/any",
"parameters": {
"translatorResultsToKeep": 4,
"deletionScore": 0.269,
"doQueryTokenOverrides": true,
"fieldDeletionScore": 0.27,
"yearDistanceWeight": 0.2
}
},
{
"name": "SomeParameterUniverseName/eng_eng",
"parameters": {
"HMMUsageThreshold": 0.8,
"stringDistanceThreshold": 0.1,
"useEditDistanceTokenScorer": true,
"finalBias": 2.4,
"reorderPenalty": 0.2
}
}
]
}'
Also supported is a simplified request, where the parameter universe name is included in the URL and the parameters are specified with url parameters.
curl -s -XPOST "localhost:9200/rni_plugin/_parameter_universe/SomeOtherUniverseName?param=eng_eng.finalBias&value=4.0"
The GET _parameter_universe
method retrieves parameter universes.
To retrieve all parameter universes in a single call:
curl -XGET "localhost:9200/rni_plugin/_parameter_universe"
To retrieve a given parameter universe, the name of the parameter universe is provided as a path parameter:
curl -XGET "http://localhost:9200/rni_plugin/_parameter_universe/SomeParameterUniverseName"
To retrieve all parameters for a specific profile within a parameter universe, add the profile:
curl -XGET "localhost:9200/rni_plugin/_parameter_universe/SomeParameterUniverseName/eng_eng"
If you include the name of the profile and a parameter, it returns the value of the parameter:
curl -XGET "http://localhost:9200/rni_plugin/_parameter_universe/SomeParameterUniverseName/eng_eng.reorderPenalty"
The DELETE _parameter_universe
method deletes parameter universes.
To delete all parameter universes in the index:
curl -XDELETE "localhost:9200/rni_plugin/_parameter_universe"
To delete a specific parameter universe:
curl -XDELETE "http://localhost:9200/rni_plugin/_parameter_universe/SomeParameterUniverseName"
To delete all parameters for a specific profile within a parameter universe:
curl -XDELETE "localhost:9200/rni_plugin/_parameter_universe/SomeParameterUniverseName/eng_eng"
To delete a parameter for a specific profile within a parameter universe:
curl -XDELETE "http://localhost:9200/rni_plugin/_parameter_universe/SomeParameterUniverseName/eng_eng.reorderPenalty"
Note
Deleting a parameter from a specific parameter profile will also delete it from the any
profile. The parameter from the default value in the parameter_defs.yaml
file will be used.