RNI can match dates returning a data match score reflecting the time similarity of the two dates. Dates that are closer together are considered a stronger match and return a match score closer to 1.
For example, 11/05/1993 and 11/07/1993 have a high score, as they are very similar and just two days apart. However, 11/05/1993 and 11/05/1995 yield a low score as they differ by two years.
The process is similar to name matching:
-
Index the dates in connection to the related names.
-
Query the date and name, receiving back a match score.
The query will return separate match scores for the name and for the associated date of birth. You may decide that the name is more important than the birth date. Within your system, you can weight and combine the name and date match scores to determine the final match score.
A date contains a year, month, and day, but not all fields are required for matching. All common delimiters for English dates are supported, and dates can be expressed with various orderings. RNI will filter out some non-date related words. Formats that include time of day are not supported.
You can specify an Elasticsearch date format that includes time information in the mapping. The time component will be ignored.
RNI supports a wide variety of date formats. The best date format will always be the ISO standard of YYYY-MM-DD
, where March 7, 1984 is written as 1984-03-07. RNI will attempt to interpret any date provided, although the less standard the format, the less guarantee that its interpretation will be the one you might expect.
Dates can be represented as YYYY-MM-DD. When some fields are unspecified, the letters represent the unknown values. For example, March 7 is YYYY-03-07, since the year in unspecified. Two digit years will be assumed to have unknown centuries. 3/7/84 is interpreted as YY84-03-07. March 7, 1984 will be an equally good match as March 7, 2084 and March 7, 1884.
When a date is provided, RNI will attempt to identify the year, month, and day within it, leaving blank any fields it cannot determine. You can omit fields if you do not have the value for one or more fields. For example: 1955-12-30, 1955--03, 12/30, -12-, --30, 1955, 1955-12- are all valid dates.
If RNI encounters an invalid date in an acceptable format, such as March 38, 1984, it will not return an error. Rather it will replace the impossible value as an unknown, March 1984.
RNI supports a wide variety of date formats.
-
Days can be represented by 1 or 2 digits.
-
Months can be numerics (1 or 2 digits) or English characters (full name or 3 character abbreviation).
-
Years can be represented by 1, 2, 3 or 4 digits.
-
Supported delimiters include , . - /
, as well as a space.
-
Partial fields can be entered.
-
At this time, only English month names and abbreviations are recognized.
-
All words are case-insensitive; upper and lower case are interpreted the same.
The following table shows different acceptable formats for the date March 7, 1984.
-
Create an index.
curl -XPUT 'http://localhost:9200/rni-test'
-
Define a mapping for fields that will contain dates. The type for a date field when matching is "rni_date"
.
curl -XPUT 'http://localhost:9200/rni-test/_mapping' -H'Content-Type: application/json' -d '{ "properties" : { "birth_date" : { "type" : "rni_date" }, "primary_name" : { "type" : "rni_name" } } }'
Optionally, in the mapping, you can specify an Elasticsearch date format. All dates must adhere to the specified format. If you specify a format that includes time information, RNI ignores the time component of the date.
Warning
Specifying an Elasticsearch format disables support for unspecified fields. If, for example, you select a format that does not include a day field ("MM-yyyy"), you will get an error when you use the date format in a query.
curl -XPUT 'http://localhost:9200/rni-test/_mapping' -H'Content-Type: application/json' -d '{ "properties" : { "birth_date" : { "type" : "rni_date", "format" : "MM-yyyy-dd" }, "primary_name" : { "type" : "rni_name" } } }'
-
Index documents containing a date field.
curl -XPUT 'http://localhost:9200/rni-test/_doc/1' -H'Content-Type: application/json' -d '{ "primary_name" : "Joe Schmoe", "birth_date" : "07-1955-24" }'
There are many ways to incorporate date matching within your query. Here are two examples, one with date matching by itself, and one with date and name matching.
Base Query. The base query is a standard query against the date field. Refer to Query the Index.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"match" : {
"birth_date" : "08-1955-25"
}
}
}'
RNI Rescore with Dates. Refer to Rescoring with RNI Pairwise Name Match.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query" : {
"match" : { "birth_date" : "08-1955-25" }
},
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"date_score" : {
"field" : "birth_date",
"query_date" : "08-1955-25"
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
The query returns a hit, with the RNI date match score.
"hits": {
"total": 1,
"max_score": 1.618923,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "AVXMepnorGuybmuiQtQr",
"_score": 0.8120856,
"_source": {
"primary_name": "Joe Schmoe",
"birth_date": "07-1955-24"
}
}
]
}
Base Query. The base query is a standard query against the date and name fields.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"primary_name": "Joe S."
}
},
{
"match": {
"birth_date": "08-1955-25"
}
}
]
}
}'
RNI Rescore with Dates. Use the doc_score
function in the rescore when matching a combination of Elasticsearch field types instead of the functions for a single type (name_score
and date_score
). The name field is also added to the rescore.
curl -XGET 'http://localhost:9200/rni-test/_search' -H'Content-Type: application/json' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"primary_name": "Joe S."
}
},
{
"match": {
"birth_date": "08-1955-25"
}
}
]
}
},
"rescore": {
"query": {
"rescore_query": {
"function_score": {
"doc_score": {
"fields": {
"primary_name": {
"query_value": "Joe S."
},
"birth_date": {
"query_value": "08-1955-25"
}
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}
}'
Similarly to the name matching parameters, there are a series of date matching parameters. The parameter values can be edited in the plugins/rni/bt_root/rlpnc/data/etc/parameter_defs.yaml
file.