Entity linking provides a mechanism for disambiguating the identity of similarly named entities mentioned in a document. For example, “Rebecca Cole” is the second African-American woman to become a doctor in the United States and also the name of an Australian professional basketball player. Linking helps establish the identity of the entity by disambiguating common names and matching a variety of names, such as nicknames and formal titles, with an entity ID.
To link entities to a knowledge base, REX uses a statistical disambiguation model trained on a knowledge base. The linker processor is delivered with a model based on a default Wikidata knowledge base. If the entity exists in Wikidata, then REX returns the Wikidata QID, such as Q1 for the Universe, in the entityId field. Once enabled, the linker can also return:
If the linker is disabled (the default), a random string is returned as the entityId. The string starts with a "T" followed by a random number, which is unique per document.
In addition to the default Wikidata knowledge base, you can train a disambiguation model for a custom knowledge base. The custom knowledge base model can replace or run in parallel with the default knowledge base.
Once the custom model has been trained, you can add new entries without retraining the model, as long as the new entries are similar to the ones used for training.
Linker Processor Files The linker processor is packaged as part of the standard REX distribution. The linker files are in the subdirectory
The linker processor both extracts and links entities. These functions are separate from the default REX entity extraction performed by the statistical, pattern-matching, and exact-matching processors, thus entities from the linker processor may differ from those returned by the statistical, pattern-matching, and exact-matching processors.
Enabling the Linker Processor
The linker processor can be enabled and disabled by setting
Entity linking is enabled by setting the
EntityExtractor linkEntities(boolean lnk) to
true, and disabled by setting it to
Selecting a Knowledge Base for Linking
By default, all knowledge bases under the
data/flinx/data/kb directory inside the REX installation will automatically be used for linking.
The list of knowledge bases can be customized with
EntityExtractor setKbDirs, which takes a
Paths to knowledge bases.
The list is in priority order; the match from the highest knowledge base on the list will be returned.
Setting the list of knowledge bases completely overwrites the list of knowledge bases the linker uses. If you want the default Wikidata knowledge base to be included, it must be on the list of knowledge bases.
DBpedia Types for Linked Entities
The linker processor can associate entities with types drawn from the DBpedia ontology, which provides over 700 types at up to seven levels of granularity.
Providing both primary and secondary entity types increases the usability of the linker processor’s results for many NLP applications. For example, classifying Pheobe Buffay (QID: Q682396) as PERSON is a necessary first step towards effective pronominal resolution, whereas the secondary type path Agent/FictionalCharacter/SoapCharacter paves the way for identifying the relationship of Pheobe Buffay with Lisa Kudrow (Q179041).
Turning on the
includeDBpediaType flag increases the recall of the linker processor’s results. When the flag is enabled, the linker will return both non-named entities like "guitar" (Q6607), type MISC, or named entities of new types, such as "2018 World Cup" (Q170645), type EVENT.
By default, linking to DBpedia is turned off. To enable it;
EntityExtractor includeDBpediaTypes (boolean includeDBpediaTypes) must be set to
The linker processor can return the Refinitiv PermID for a subset of entities which are identified with a QID. By default, linking to PermIDs is turned off.
Entity linking to PermIDs is enabled by setting the
EntityExtractor includePermID(boolean includePermID) to true, and disabled by setting it to false. In order to activate PermID linking, both
EntityExtractor includePermID(boolean includePermID) and
EntityExtractor linkEntities(boolean lnk) must be set to
true. When PermID linking isn't explicitly set, its default value is
This feature is currently in LABS and subject to change. We welcome your observations and feedback.