CONFLICT
|
The tokens do not match.
|
"william" and "johnson" are a CONFLICT.
|
DELETION
|
The token is unmatched.
|
When comparing "Richard William Smith" with "Richard Smith", "william" would be considered a DELETION.
|
EMBEDDING_MATCH
|
The tokens are semantically similar as determined by word-embedding vectors.
|
When comparing "boston building company" and "boston construction company", "building" and "construction" are an EMBEDDING_MATCH.
|
FIELD_BLOCKED
|
This field cannot be matched because of a cross-field match involving the same field in the other name.
|
When comparing "Bob|William|Smith" with "William||Smith", "bob" is a FIELD_BLOCKED since the cross-field william match prevents it from matching with its corresponding field.
|
FIELD_CONFLICT
|
When comparing two names that are divided into fields, these fields do not match.
|
When comparing "Richard|William|Smith" with "Richard|Johnson|Smith", "william" and "johnson" would be considered a FIELD_CONFLICT.
|
FIELD_DELETION
|
When comparing two names that are divided into fields, this field is unmatched.
|
When comparing "Richard|William|Smith" with "Richard||Smith", "william" would be considered a FIELD_DELETION.
|
GIVEN_NAME_DELETION
|
When comparing two names that are divided into fields, the GIVEN_NAME field is unmatched.
|
When comparing "Richard|William|Smith" and "||William|Scott", "Richard" will be a GIVEN_NAME_DELETION if that field in both names is marked as a Given_name field.
|
HANI_ABBREVIATION
|
One Hani token appears to be an abbreviation of another Hani token.
|
"北京大学" and "北大" are a HANI_ABBREVIATION match.
|
HMM_MATCH
|
The tokens are similar but not identical, and the match was determined by a particular model (hidden Markov model). This is a type of fuzzy match.
|
"richard" and "richerd" are an HMM_MATCH.
|
INITIALISM
|
One token is a name and the other token is the initials of the words which make up the name.
|
"john fitzgerald kennedy" and "JFK" are an INITIALISM.
"consumer value stores" and "CVS" are an INITIALISM.
|
INITIAL_MATCH
|
One token is the first initial of the other.
|
"w" and "william" are an INITIAL_MATCH.
|
LANGUAGE_SPECIFIC_MATCH
|
The match was determined by a language-specific matcher.
|
"laden" and "لادن" are a LANGUAGE_SPECIFIC_MATCH.
|
MATCH
|
The tokens are identical (after stop word elimination and normalization).
|
"john" and "john" are a MATCH.
|
NULL
|
The NULL phenomenon is only listed in this table for completeness. It is only used internally and will never be returned in the SpanMatch object.
|
N/A
|
OUT_OF_ORDER_DELETION
|
This unmatched token still leaves the remaining tokens out of order when it is removed.
|
When comparing "George Herbert Walker Bush" with "George Bush Walker", "herbert" would be considered an OUT_OF_ORDER_DELETION.
|
OVERRIDE
|
The tokens appear as a pair on the override list. This is often used for nicknames.
|
"john" and "jack" will be an OVERRIDE match if they appear as a pair on the override list.
|
OVERRIDE_WRAPPED_HMM
|
One token is similar, but not identical to, the other token's counterpart on the override list.
|
"john" and "jakk" will be an OVERRIDE_WRAPPED_HMM match if "john" and "jack" appear as a pair on the override list.
|
PREFIX_INITIAL
|
One token is an initial that matches a prefix in the other token.
In practice, the PREFIX_INITIAL phenomenon is rare.
|
If the initialsScore parameter is set to 0.1, "E Silva" and "EduardoSil" will be a PREFIX_INITIAL match.
|
SEQ2SEQ_MATCH
|
The tokens are similar but not identical, and the match was determined by a particular model (neural seq2seq). This is a type of fuzzy match.
|
"バラック・オバマ" and "Barack Obama" will be a SEQ2SEQ_MATCH when that model is active.
|
STRING_SIMILARITY
|
The tokens are similar in string edit distance (number of insertions, deletions, and substitutions) but not similar enough to be a fuzzy match.
|
"akcd" and "xkcd" are a STRING_SIMILARITY match.
|
STUCK_INITIAL
|
One name appears to have an initial mistakenly attached to a preceding token.
|
"DavidK" and "David Keith" are a STUCK_INITIAL match.
|
SURNAME_DELETION
|
When comparing two names that are divided into fields, the SURNAME field is unmatched.
|
When comparing "Richard|William|Smith" and "Richard|William||", "Smith" will be a SURNAME_DELETION if that field in both names is marked as a Surname field.
|
TRAILING_PATRONYMIC_DELETION
|
The unmatched token is a patronymic which has been truncated in the other name.
|
When comparing "Faisal bin Fahd bin Abdullah" and "Faisal bin Fahd", "bin Abdullah" is considered a TRAILING_PATRONYMIC_DELETION.
|
TRUNCATED_EXACT_MATCH
|
The tokens are identical except that one has been slightly truncated.
|
"murgatroyd" and "murgatroy" are a TRUNCATED_EXACT_MATCH.
|
TRUNCATED_HMM_MATCH
|
The tokens are similar, but not identical, and one has been slightly truncated.
|
"gilpatrickz" and "gillpatrick" are a TRUNCATED_HMM_MATCH.
|
UNKNOWN_FIELD_MATCH
|
One of the tokens is part of an "unknown" field in a fielded name.
The UNKNOWN_FIELD_MATCH phenomenon is rare and usually requires use of the Java API.
|
When comparing "Richard|William|Smith" with "Richard|William|Scott", if the first field is an "unknown" field, "richard" and "richard" would be considered an UNKNOWN_FIELD_MATCH.
|