A search can include multiple fields and return a single match and match score. The fields can be any combination of type rni_name
, rni_date
, rni_address
, or any other Elasticsearch field type.
Each field can be assigned a weight to reflect its importance in the overall matching logic. When searching for a match, some fields are more important in determining a match than others. For example, the name field is likely more important in determining a match than an address field. If no weights are defined, each field is weighted equally.
When matching records, a similarity score is calculated for each field. Then the final match score is then calculated by performing a weighted arithmetic mean over each of the similarity scores. If a field is missing from a document, that field is removed from the score calculation and its weight is evenly distributed across other fields. You can override this behavior by using the score_if_null
option to specify a score to be returned if the field is null in the index document.
-
Create an index with a mapping containing fields with different types
curl -XPUT 'http://localhost:9200/rni-test' -H'Content-Type: application/json' -d '{
"mappings" : {
"properties" : {
"name" : { "type" : "rni_name" },
"dob" : { "type" : "rni_date" },
"address" : { "type" : "rni_address" },
"height" : { "type" : "integer" },
"nationality" : { "type" : "keyword" }
}
}
}'
-
Index documents that contain those fields
curl -XPUT 'http://localhost:9200/rni-test/_doc/1' -H'Content-Type: application/json' -d '{
"name" : "Ryan McDonagh",
"dob" : "11/19/1987",
"address" : {
"houseNumber" : "47",
"road" : "Park St",
"city" : "Boston",
"state" : "MA"
},
"nationality" : "USA",
"height" : 65
}'
The query can be a record containing multiple fields. The fields in the query record must be mapped to those of the indexed documents.
Base Query. The base query is a standard Elasticsearch query containing multiple fields that will return candidates for rescoring.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"bool" : {
"should" : [
{ "match" : { "name" : "Brian McDonough", "entityType": "PERSON" } },
{ "match" : { "dob" : "10/19/87" } },
{ "match" : { "address" : "{ \"houseNumber\" : \"48\",
\"road\" : \"Parker St\",
\"city\" : \"Boston\",
\"state\" : \"MA\" }" } }
]
}
}
}'
RNI Rescore with Records. Use the doc_score
function to rescore the indexed documents against a query record.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"bool" : {
"should" : [
{ "match" : { "name" : "Brian McDonough" } },
{ "match" : { "dob" : "10/19/87" } },
{ "match" : { "address" : "{ \"houseNumber\" : \"48\",
\"road\" : \"Parker St\",
\"city\" : \"Boston\",
\"state\" : \"MA\" }" } }
]
}
},
"rescore" : {
"rni_query" : {
"rescore_query" : {
"function_score" : {
"doc_score" : {
"fields" : {
"name" : { "query_value": "Brian McDonough" },
"dob" : { "query_value": "10/19/87" },
"address" : {
"query_value" : {
"houseNumber" : "48",
"road" : "Parker St",
"city" : "Boston",
"state" : "MA"
}
},
"height" : { "query_value": 67 },
"nationality" : { "query_value": "CANADA" }
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
As with addresses, the query_value
of names can be an object to match additional name information. The rescore query above can easily be modified to additionally match against a name's entityType
field:
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"bool" : {
"should" : [
{ "match" : { "name" : "Brian McDonough" } },
{ "match" : { "dob" : "10/19/87" } },
{ "match" : { "address" : "{ \"houseNumber\" : \"48\",
\"road\" : \"Parker St\",
\"city\" : \"Boston\",
\"state\" : \"MA\" }" } }
]
}
},
"rescore" : {
"rni_query" : {
"rescore_query" : {
"function_score" : {
"doc_score" : {
"fields" : {
"name" : {
"query_value": {
"data": "Brian McDonough",
"entityType": "PERSON"
}
},
"dob" : { "query_value": "10/19/87" },
"address" : {
"query_value" : {
"houseNumber" : "48",
"road" : "Parker St",
"city" : "Boston",
"state" : "MA"
}
},
"height" : { "query_value": 67 },
"nationality" : { "query_value": "CANADA" }
}
}
}
},
"query_weight" : 0.0
"rescore_query_weight" : 1.0
}
}
}'
Note
The quotes in the query
above are escaped because you can't pass an object to the basic Elasticsearch query; it requires a string. The rescore queries can handle objects because they are using RNI functions to parse the values.
Weighted Multi-Field Query
Each field can be given a weight to reflect its importance in the overall matching logic.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"bool" : {
"should" : [
{ "match" : { "name" : "Brian McDonough" } },
{ "match" : { "dob" : "10/19/87" } },
{ "match" : { "address" : "{ \"houseNumber\" : \"48\",
\"road\" : \"Parker St\",
\"city\" : \"Boston\", \"state\" : \"MA\" }" } }
]
}
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"doc_score": {
"fields": {
"name": { "query_value": "Brian McDonough", "weight": 4 },
"dob": { "query_value": "10/19/87", "weight": 2 },
"address" : {
"query_value" : {
"houseNumber" : "48",
"road" : "Parker St",
"city" : "Boston",
"state" : "MA"
},
"weight" : 2
},
"height" : { "query_value": 67, "weight": 0.5},
"nationality" : { "query_value": "CANADA", "weight": 1 }
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
By default, if a queried-for field is null in the index, the field is removed from the score calculation, and the weights of the other fields are redistributed. However, you can override this behavior by using the score_if_null
option to specify what score should be returned for this field if it is null in the index document.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"bool" : {
"should" : [
{ "match" : { "name" : "Brian McDonough" } },
{ "match" : { "dob" : "10/19/87" } },
{ "match" : { "address" : "{ \"houseNumber\" : \"48\",
\"road\" : \"Parker St\",
\"city\" : \"Boston\", \"state\" : \"MA\" }" } }
]
}
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"doc_score" : {
"fields" : {
"name" : { "query_value": "Brian McDonough", "weight": 4, "score_if_null" : 0.0 },
"dob": { "query_value": "10/19/87", "weight": 2 },
"address" : {
"query_value" : "{
"houseNumber" : "48",
"road" : "Parker St",
"city" : "Boston",
"state" : "MA"
},
"weight" : 2
},
"height" : { "query_value": 67, "weight": 0.5},
"nationality" : { "query_value": "CANADA", "weight": 1 , "score_if_null" : 1.0 }
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
Note
The quotes in the query
above are escaped because you can't pass an object to the basic Elasticsearch query; it requires a string. The rescore queries can handle objects because they are using RNI functions to parse the values.
Multi-Field Query with Multiple Nested Fields
The doc_score
function for rescoring does not currently support search queries containing multiple nested fields. To perform these queries, chain multiple rescorers and adjust the query_weight
and rescore_query_weight
parameters to control the relative importance of the original query and of the rescore query, respectively. When chaining multiple RNI advanced rescorers, be sure to add "score_mode":"total"
to each rni_query object to ensure the final score is properly accumulated.
This example expands the previous examples, adding alias names and modifying the single date of birth (dob field) to contain a list of dates of birth, one for each alias (dob
field).
-
Create an index with a mapping containing multiple nested fields
curl -XPUT "http://localhost:9200/rni-test" -H 'Content-Type: application/json' -d'{
"mappings": {
"properties": {
"name": {
"type": "rni_name"
},
"aliases": {
"type": "nested",
"properties": {
"alias_name": {
"type": "rni_name"
}
}
},
"dobs": {
"type": "nested",
"properties": {
"dob": {
"type": "rni_date"
}
}
},
"address": {
"type": "rni_address"
},
"height": {
"type": "integer"
},
"nationality": {
"type": "keyword"
}
}
}
}'
-
Index documents that contain the fields
curl -XPUT "http://localhost:9200/rni-test/_doc/1" -H 'Content-Type: application/json' -d'{
"name": "Ryan McDonagh",
"aliases": [
{
"alias_name": "Rayan McDonagh"
},
{
"alias_name": "R. McDonagh"
},
{
"alias_name": "Rayan M."
}
],
"dobs": [
{
"dob": "11/19/1987"
},
{
"dob": "11/20/1987"
},
{
"dob": "10/19/1987"
}
],
"address": {
"houseNumber": "47",
"road": "Park St",
"city": "Boston",
"state": "MA"
},
"nationality": "USA",
"height": 65
}'
-
Query index with chained multiple rescorers
curl -XGET "http://localhost:9200/rni-test/_search" -H 'Content-Type: application/json' -d'{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "dobs",
"query": {
"bool": {
"should": {
"match": { "dob": "10/19/87"}
}
}
}
}
},
{
"nested": {
"path":"aliases",
"query": {
"bool": {
"should": {
"match": {"name": "Brian McDonough"}
}
}
}
}
},
{
"match":{
"address": "{\"houseNumber\": \"48\", \"road\": \"Parker St\", \"city\": \"Boston\", \"state\": \"MA\" }"
}
}
]
}
},
"rescore": [
{
"rni_query": {
"rescore_query": {
"nested": {
"score_mode": "max",
"path": "aliases",
"query": {
"rni_function_score": {
"name_score": {
"field": "aliases.alias_name",
"query_name": "Brian McDonough",
"window_size_allowance": 1
}
}
}
}
},
"score_mode": "total",
"query_weight": 0.0,
"rescore_query_weight": 1.0,1
"filter_out_scores_below": 0.6
}
},
{
"rni_query": {
"rescore_query": {
"nested": {
"score_mode": "max",
"path": "dobs",
"query": {
"rni_function_score": {
"date_score": {
"field": "dobs.dob",
"query_date": "10/19/87"
}
}
}
}
},
"score_mode": "total",
"query_weight": 0.67,
"rescore_query_weight": 0.33 2
}
},
{
"rni_query": {
"rescore_query": {
"rni_function_score": {
"address_score": {
"field": "address",
"query_address": {
"houseNumber": "48",
"road": "Parker St",
"city": "Boston",
"state": "MA"
}
}
}
},
"score_mode": "total",
"query_weight": 0.75,
"rescore_query_weight": 0.25 3
}
},
{
"query": {
"rescore_query": {
"match": {
"height": 67
}
},
"query_weight": 0.89,
"rescore_query_weight": 0.11 4
}
},
{
"query": {
"rescore_query": {
"match": {
"nationality": "CANADA"
}
},
"query_weight": 0.9,
"rescore_query_weight": 0.1 5
}
}
]
}'
To calculate the rescore_query_weight
for each nested field, you have to work from bottom to top, dividing each field's desired weight by the product of the already-calculated query_weight
values. The query_weight
is calculated by subtracting the rescore_query_weight
from 1.
If there are no previous query_weight
values, the rescore_query_weight
is simply the desired field weight.
In this example, the desired field weights are 0.4, 0.2, 0.2, 0.1, and 0.1 for the alias, dob, address, height, and country fields, respectively.
1
|
Rescore based on alias
Name field weight = 0.4
rescore_query_weight = 0.4 / (0.667 x 0.75 x 0.89 x 0.9) = 1
query_weight = 1 - 1 = 0
|
2
|
Rescore based on date of birth
DOB field weight = 0.2
rescore_query_weight = 0.2 / (0.75 x 0.89 x 0.9) = 0.333
query_weight = 1 - 0.33 = 0.667
|
3
|
Rescore based on address
Address field weight = 0.2
rescore_query_weight = 0.2 / (0.9 * 0.89) = 0.25
query_weight = 1 - 0.25 = 0.75
|
4
|
Rescore based on height
Height field weight = 0.1
rescore_query_weight = 0.1 / 0.9 = 0.11
query_weight = 1 - 0.11 = 0.89
|
5
|
Rescore based on nationality
Country field weight = 0.1
rescore_query_weight = 0.1
query_weight = 1 - 0.1 = 0.9
|
Weighted Multi-Field Query with Custom Similarity Function
While the doc_score function has built-in similarity functions for many core field types, a custom similarity function can be provided at query time. In this manufactured example, we'll use a simple script_score
function that matches CANADA and USA with a high score. Refer to the Elasticsearch documentation for more details about Elasticsearch scripting. Any other function can also be used.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"bool" : {
"should" : [
{ "match" : { "name" : "Brian McDonough" } },
{ "match" : { "dob" : "10/19/87" } },
{ "match" : { "address" : "{ \"houseNumber\" : \"48\",
\"road\" : \"Parker St\", \"city\" : \"Boston\",
\"state\" : \"MA\" }" } }
]
}
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"doc_score": {
"fields": {
"name": { "query_value": "Brian McDonough", "weight": 4 },
"dob": { "query_value": "10/19/87", "weight": 2 },
"address" : {
"query_value" : {
"houseNumber" : "48",
"road" : "Parker St",
"city" : "Boston",
"state" : "MA"
},
"weight" : 2
},
"height": { "query_value": 67, "weight": 0.5 },
"nationality": {
"function": {
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"params": {
"query_value": "CANADA"
},
"inline": "if (params.query_value == '\''CANADA'\'' &&
doc['\''nationality'\''].value == '\''USA'\'') {return 0.8}
else {return 0.2}"
}
}
}
},
"weight": 1
}
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
Note
The quotes in the query
above are escaped because you can't pass an object to the basic Elasticsearch query; it requires a string. The rescore queries can handle objects because they are using RNI functions to parse the values.