When searching text, we've all had the experience of "I know what I'm looking for, I just can't find it...". The challenges of variety and ambiguity of expression, language, and lack of subject matter expertise can prevent analysts from efficiently finding people, organizations, locations, and other important artifacts in unstructured content. Identity Resolver leverages artificial intelligence to solve the task of processing multi-lingual content, capturing the expertise of experienced analysts and applies it at scale to all of your unstructured content. Analysts can then curate and make connections, utilizing their knowledge more effectively.
With Identity Resolver, you can:
Let AI mine incoming data streams to automatically match entity mentions and their context to knowledge base entries.
Enable subject matter experts to use their expertise to easily curate and connect knowledge base entries and capture their institutional knowledge.
Discover “ghosts” — the entities unknown to your knowledge base – by tracking them across hundreds of sources before promoting them to an authoritative identity.
Enrich and reveal the relationships and sentiments between your identities.
Eliminate the need for complex queries, let AI address the variety, ambiguity, and language challenges in your data.
Entities are the key actors in your text data: the organizations, people, locations, and products mentioned in documents. Rosette uncovers these entities, delivering structure, clarity, and insight to your data with adaptability, easy deployment, and consistent accuracy and performance across a broad range of languages and text genres.
An identity is a cumulative profile of information about a unique real world entity. While many identities are for person, organization, and location (POL) entities, Identity Resolver can also identify additional entities, such as planes, tanks, and any object type which is important to your operation.
Identity Resolver disambiguates entities into identities. It compares the entities extracted from unstructured text to an existing knowledge base, matching the entity to an identity in the knowledge base. If a match is not found in the knowledge base, a ghost identity is created.
Identity Resolver supports four types of entities:
Other is any type which is not a person, organization, or location (POL). Other entities can include anything that is of interest to your organization.
A knowledge base is a collection of identities, where each identity has a unique identifier.
Examples of knowledge bases:
Wikidata is a free, collaborative, multilingual knowledge base containing structured data that is commonly used to identify people, locations, and organizations.
The UN Sanctions list is a list of individuals and entities that are currently subject to sanction measures.
A list of planes, tanks, and other objects of interest to your organization.
A knowledge base can include a combination of entity types and sources. For example, your knowledge base could include some Wikidata, along with people and objects that you've added to the knowledge base.
Authoritative and Ghost Identities
Once an entity is extracted, it becomes an identity entry in the knowledge base. But how do you distinguish between the entries you care about in your organization, and all the other entities you may encounter throughout the text being evaluated? An authoritative identity is one which is significant to your organization. When matching newly extracted entities with existing identities, authoritative identities are given a higher score than ghost identities.
When you load a knowledge base, whether it be from Wikidata, or a list of assets from your organization, Identity Resolver creates identities for each of the entities and marks them as authoritative. As entities are extracted from text, they are either matched to an identity in the knowledge base, or a ghost identity is created. A ghost identity is an identity in the knowledge base which is not marked as authoritative.
A ghost may be created because:
There is no matching entry in the knowledge base. In this case you can promote it, creating a new authoritative entry in the knowledge base.
There is a matching entry in the knowledge base, but Identity Resolver did not make the match. In this case you can merge the ghost identity with the existing identity.
You do not have to merge or promote ghost identities. You can leave them as unauthoritive identities in the knowledge base.
A reference is a piece of information about an identity derived from a single source. When Identity Resolver imports text, it extracts entities and disambiguates them to identities. References provide the details about an identity, an identity can have one or more references. When reviewing an identity, analysts can see where the identity was referenced, providing information about its provenance in the system.
Within the reference, there may be one or more mentions of the entity is assigned to the identity as a reference. Each mention has offsets detailing the exact position of the mention within the text. A reference can contain multiple mentions of the same entity, as well as mentions of multiple entities.
A reference or just mentions can be reassigned as knowledge evolves. While reviewing and verifying the data extracted, analysts can move entire references or specific mentions to another identity.
The major tasks of Identity Resolver are:
Create a knowledge base of authoritative data. The source could be Wikidata (provided) and any authoritative data available in your organization.
Capture and analyze new documents, extracting entities and disambiguating into identities.
While creating the initial knowledge base is a one-time set-up task, the knowledge base is constantly growing and improving as new documents are added and analysts use their knowledge to curate and improve the information in the knowledge base.