You can perform a pairwise match between two rni_names, rni_dates, rni_addresses, or other datatypes through the POST _pair_match
method. The results provide insight into how the match scores were calculated, including tokens and token scores. This endpoint can help you understand the impact a specific match parameter has on the final score, and can aid in testing and debugging RNI.
The type of pairwise match being performed is provided to the query, along with the values being compared (data1
and data2
). You can also specify one or more parameters and see how they impact the match scores.
Request
curl -XPOST "http://localhost:9200/rni_plugin/_pair_match?type=rni_date" -H 'Content-Type: application/json' -d'
{"dataPair": {"data1": "12/25/19","data2": "1/15/20"},
"parameters": {
"timeDistanceWeight": ".8",
"stringDistanceWeight": "0"}}'
Response
{
"score" : 0.730683530798762,
"type" : "ORIGINAL",
"preSwapPenaltyScore" : 0.730683530798762,
"swapPenaltyFactor" : 1.0,
"preFinalBiasScore" : 0.730683530798762,
"finalBias" : 1.0,
"debugInfo" : """[{"name":"TIME","weight":0.8,"score":0.6949591099211685},
{"name":"YEAR","weight":0.2,"score":0.9330329915368074},
{"name":"MONTH","weight":0.2,"score":0.6830201283771977},
{"name":"DAY","weight":0.1,"score":0.7071067811865476},
{"name":"STRING","weight":0.0,"score":0.375}]"""
}
The following data types are supported by the pairwise match endpoint.
-
rni_name
-
rni_date
-
rni_address
-
date
-
keyword
-
text
-
string
-
integer
-
long
-
short
-
double
-
float
-
boolean
-
geo_point
Request
curl -XPOST "http://localhost:9200/rni_plugin/_pair_match?type=text" -H 'Content-Type: application/json' -d'
{
"dataPair":
{
"data1": "word1",
"data2": "word2"
}
}'
Response
{
"score" : 0.8333333333333334
}
Request
Parameters are specified directly in the request. The source language (language
) of the name is optional, but recommended if known.
curl -XPOST "http://localhost:9200/rni_plugin/_pair_match?type=rni_name" -H 'Content-Type: application/json' -d'
{
"dataPair":
{
"data1":
{
"data": "John Robert Edward Smith",
"language": "eng",
"entityType": "PERSON"
},
"data2":
{
"data": "John Smyth",
"language": "eng",
"entityType": "PERSON"
}
},
"parameters": {
"deletionScore": 0.469
}
}'
Response
The response includes detailed information on how the names were matched.
"score": 0.8679031451202058,
"type": "TOKEN_BY_TOKEN",
"avgMatchedTokenLMBinLeft": 1,
"avgMatchedTokenLMBinRight": 1,
"annotations": [
{
"type": "One-sided deletion boost",
"parameter": 0,
"oldScore": 0.7033713195696144,
"newScore": 0.7324475249463439
},
{
"type": "Final bias",
"parameter": 0,
"oldScore": 0.7324475249463439,
"newScore": 0.8679031451202058
}
],
"leftTokens": [
{
"data": "john",
"field": 0,
"originalData": "john",
"spanStart": 0,
"spanEnd": 4,
"type": "UNKNOWN",
"weight": 0.2547722342733189
},
{
"data": "robert",
"field": 0,
"originalData": "robert",
"spanStart": 5,
"spanEnd": 11,
"type": "UNKNOWN",
"weight": 0.24522776572668115
},
{
"data": "edward",
"field": 0,
"originalData": "edward",
"spanStart": 12,
"spanEnd": 18,
"type": "UNKNOWN",
"weight": 0.24522776572668115
},
{
"data": "smith",
"field": 0,
"originalData": "smith",
"spanStart": 19,
"spanEnd": 24,
"type": "UNKNOWN",
"weight": 0.2547722342733189
}
],
}
"rightTokens": [
{
"data": "john",
"field": 0,
"originalData": "john",
"spanStart": 0,
"spanEnd": 4,
"type": "UNKNOWN",
"weight": 0.5
},
{
"data": "smyth",
"field": 0,
"originalData": "smyth",
"spanStart": 5,
"spanEnd": 10,
"type": "UNKNOWN",
"weight": 0.5
}
],"spanMatches": [
{
"leftSpan": "Span(0, 4)",
"rightSpan": "Span(0, 4)",
"reason": "MATCH"
},
{
"leftSpan": "Span(19, 24)",
"rightSpan": "Span(5, 10)",
"reason": "HMM_MATCH"
},
{
"leftSpan": "Span(5, 18)",
"rightSpan": null,
"reason": "DELETION"
}
],
"finalTuples": [
{
"originalScore": 1,
"packingScore": 3.019088937093276,
"score": 1,
"reason": "MATCH",
"leftIndex0": 0,
"leftIndex1": 0
},
{
"originalScore": 0.7237045473947534,
"packingScore": 2.730932483146512,
"score": 0.7237045473947534,
"reason": "HMM_MATCH",
"leftIndex0": 3,
"leftIndex1": 3
},
{
"originalScore": 0.469,
"packingScore": 0,
"score": 0.469,
"reason": "DELETION",
"leftIndex0": 1,
"leftIndex1": 2
}
],
"otherTuples": [],
"debugInfo": """
Begin [ John Robert Edward Smith Latn eng:eng NONE (john robert edward smith)] [ John Smyth Latn eng:eng NONE (john smyth)]
-- Token data ------------
john (john) bin=1.0 (w/bias = 1.0000)
robert (robert) bin=1.0 (w/bias = 1.0000)
edward (edward) bin=1.0 (w/bias = 1.0000)
smith (smith) bin=1.0 (w/bias = 1.0000)
john (john) bin=1.0 (w/bias = 1.0000)
smyth (smyth) bin=1.0 (w/bias = 1.0000)
--------------------------
john/25@0:0 john/50@0:0 -> 1.0000 <+S>
john/25@0:0 smyth/50@0:1 -> 0.0000 <null>
robert/25@0:1 john/50@0:0 -> 0.0000 <null>
robert/25@0:1 smyth/50@0:1 -> 0.0000 <null>
edward/25@0:2 john/50@0:0 -> 0.0000 <null>
edward/25@0:2 smyth/50@0:1 -> 0.0000 <null>
smith/25@0:3 john/50@0:0 -> 0.0000 <null>
smith/25@0:3 smyth/50@0:1 -> 0.7237 <pi[4]=0.4782 pj[4]=0.4402 +S>
johnrobertedwardsmith/100@0:0..3 johnsmyth/100@0:0..1 => 0.0000
-- All Tuples ------------
john/25@0:0 == john/50@0:0 t=1.0000 MATCH (s=2 q=3.0191 o=1.0000)
smith/25@0:3 == smyth/50@0:1 t=0.7237 HMM_MATCH (s=2 q=2.7309 o=0.7237)
--------------------------
john/25@0:0 == john/50@0:0 t=1.0000 MATCH (s=2 q=3.0191 o=1.0000)
smith/25@0:3 == smyth/50@0:1 t=0.7237 HMM_MATCH (s=2 q=2.7309 o=0.7237)
robertedward/49@0:1..2 == <DEL> t=0.4690 DELETION (s=0 q=0.0000 o=0.4690)
One-sided deletion boost: 0.7034 -> 0.7324
Final bias: 0.7324 -> 0.8679
Score = 0.8679: [ John Robert Edward Smith Latn eng:eng NONE (john robert edward smith)] [ John Smyth Latn eng:eng NONE (john smyth)]
-- Token data ------------
john (john) bin=1.0 (w/bias = 1.0000)
robert (robert) bin=1.0 (w/bias = 1.0000)
edward (edward) bin=1.0 (w/bias = 1.0000)
smith (smith) bin=1.0 (w/bias = 1.0000)
john (john) bin=1.0 (w/bias = 1.0000)
smyth (smyth) bin=1.0 (w/bias = 1.0000)
--------------------------
Score[alt] = 0.0000: [ John Robert Edward Smith Latn eng:eng NONE (john robert edward smith)] [ John Smyth Latn eng:eng NONE (john smyth)]
End 0.8679
"""
}
Request
The pairwise match endpoint supports both fielded and unfielded addresses. Fielded addresses must be specified as objects, while unfielded addresses must be specified as strings.
curl -XPOST "http://localhost:9200/rni_plugin/_pair_match?type=rni_address" -H 'Content-Type: application/json' -d'
{
"dataPair":
{
"data1":
{
"houseNumber": "101",
"road": "Main st",
"city": "Cambridge",
"state": "Massachusetts",
"country": "United States of America"
},
"data2": "101 Main St, Cambridge, MA, USA"
}
}'
Response
The response includes a score
marking the similarity of the two addresses as well as a type
field describing the type of match observed. The response also includes detailed information on how each of the fields were matched. In the example below, only part the detailed response for HOUSE_NUMBER
is included.
{
"score":0.9,
"type":"OTHER",
"annotations":[
{
"type":"Final bias",
"parameter":0.0,
"oldScore":0.9,
"newScore":0.9
}
],
"addressFieldPairResults":[
{
"leftField":"HOUSE_NUMBER",
"rightField":"HOUSE_NUMBER",
"score":1.0,
"spanMatches":[
{
"leftSpan":{
"start":0,
"end":3,
"length":3
},
"rightSpan":{
"start":0,
"end":3,
"length":3
},
"reason":"MATCH"
}
],
"leftTokens":[
{
"data":"101",
"field":0,
"originalData":"101",
"spanStart":0,
"spanEnd":3,
"type":"NONE",
"weight":1.0
}
],
"rightTokens":[
{
"data":"101",
"field":0,
"originalData":"101",
"spanStart":0,
"spanEnd":3,
"type":"NONE",
"weight":1.0
}
],
...
"debugInfo":""
},
{
"score": 0.9,
"type": "ORIGINAL"
}
...