RNI can match dates returning a data match score reflecting the time similarity of the two dates. Dates that are closer together are considered a stronger match and return a match score closer to 1.
For example, 11/05/1993 and 11/07/1993 have a high score, as they are very similar and just two days apart. However, 11/05/1993 and 11/05/1995 yield a low score as they differ by two years.
The process is similar to name matching:
-
Index the dates in connection to the related names.
-
Query the date and name, receiving back a match score.
The query will return separate match scores for the name and for the associated date of birth. You may decide that the name is more important than the birth date. Within your system, you can weight and combine the name and date match scores to determine the final match score.
A date contains a year, month, and day, but not all fields are required for matching. All common delimiters for English dates are supported and dates can be expressed with various orderings. RNI will filter out some non-date related words. Formats that include time of day are not supported unless you specify an Elasticsearch date format that includes time information in the mapping. The time component will be ignored.
Omit fields if you do not have the value for one or more fields. For example: 1955-12-30, 1955--03, 12/30, -12-, --30, 1955, 1955-12-.
RNI supports a wide variety of date formats.
-
Days can be represented by 1 or 2 digits
-
Months can entered as numerics (1 or 2 digits) or English characters (full name or 3 character abbreviation)
-
Years can be represented as 1, 2, 3 or 4 digits
-
Supported delimiters include , . - /
, as well as a space
-
Partial fields can be entered
Examples: All of the following are acceptable:
-
3/24/1984
-
March 1984
-
3-24
-
-24-84
-
March 24, 1984
-
24-3-84
-
March
-
1984
-
Create an index.
curl -XPUT 'http://localhost:9200/rni-test'
-
Define a mapping for fields that will contain dates. The type for a date field when matching is "rni_date"
.
curl -XPUT 'http://localhost:9200/rni-test/_mapping' -H'Content-Type: application/json' -d '{ "properties" : { "birth_date" : { "type" : "rni_date" }, "primary_name" : { "type" : "rni_name" } } }'
Optionally, in the mapping, you can specify an Elasticsearch date format. All dates must adhere to the specified format. If you specify a format that includes time information, RNI ignores the time component of the date.
Warning
Specifying an Elasticsearch format disables support for unspecified fields. If, for example, you select a format that does not include a day field ("MM-yyyy"), you will get an error when you use the date format in a query.
curl -XPUT 'http://localhost:9200/rni-test/_mapping' -H'Content-Type: application/json' -d '{ "properties" : { "birth_date" : { "type" : "rni_date", "format" : "MM-yyyy-dd" }, "primary_name" : { "type" : "rni_name" } } }'
-
Index documents containing a date field.
curl -XPUT 'http://localhost:9200/rni-test/_doc/1' -H'Content-Type: application/json' -d '{ "primary_name" : "Joe Schmoe", "birth_date" : "07-1955-24" }'
There are many ways to incorporate date matching within your query. Here are two examples, one with date matching by itself, and one with date and name matching.
Base Query. The base query is a standard query against the date field. Refer to Query the Index.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"match" : {
"birth_date" : "08-1955-25"
}
}
}'
RNI Rescore with Dates. Refer to Rescoring with RNI Pairwise Name Match.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"match" : { "birth_date" : "08-1955-25" }
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"date_score" : {
"field" : "birth_date",
"query_date" : "08-1955-25"
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
The query returns a hit, with the RNI date match score.
"hits": {
"total": 1,
"max_score": 1.618923,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "AVXMepnorGuybmuiQtQr",
"_score": 0.8120856,
"_source": {
"primary_name": "Joe Schmoe",
"birth_date": "07-1955-24"
}
}
]
}
Base Query. The base query is a standard query against the date and name fields.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"primary_name": "Joe S."
}
},
{
"match": {
"birth_date": "08-1955-25"
}
}
]
}
}'
RNI Rescore with Dates. Use the doc_score
function in the rescore when matching a combination of Elasticsearch field types instead of the functions for a single type (name_score
and date_score
). The name field is also added to the rescore.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"rescore": {
"query": {
"rescore_query": {
"function_score": {
"doc_score": {
"fields": {
"primary_name": {
"query_value": "Joe S."
},
"birth_date": {
"query_value": "08-1955-25"
}
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
Similarly to the name matching parameters, there are a series of date matching parameters. The parameter values can be edited in the plugins/rni/bt_root/rlpnc/data/etc/parameter_defs.yaml
file.
Because dates are sometimes written month day and other times written day month, swap tries matching the date fields as written as well as with the month and date fields switched. The best score is returned as the match score. For example, if the dates in question are 1970-3-5 and 1970-6-4, this feature will match the following four pairs: