Loading a custom KB, such as the UN Sanctions list, requires three steps:
Prepare the custom KB as a formatted CSV file
Map the KB fields to Rosette Identity field types
Import the CSV file
An example of a custom KB, the UN Sanctions list, can be found in the
sample-data/un_sanctions subdirectory of the release folder. Currently, the system is limited to custom KBs that contain only one entity type (PERSON, ORGANIZATION, LOCATION), so the
un_sanctions folder contains two files:
Preparing a Custom Knowledge Base
You can prepare and upload a custom KB based on your business needs. Here we are using the example of the UN Sanctions list. The custom KB can be any list of people or organizations you want to match against.
First, prepare a CSV file containing at least these three fields:
ID: A unique identifier, which must be unique across all custom KB entries. You can load multiple custom KBs; the ID must unique across all KBs.
Name: The full name of the entity. This is used by the linking model to identify link candidates.
Description: A free text description of the entity. This will also be used by the linking model to determine whether there is a match between an extracted entity an a KB entry.
There may be other fields in the CSV, but they will not be used by the system for linking.
Figure 1. Person section of the UN Sanctions List
Mapping Knowledge Base Fields
Once the CSV has been prepared, the next step is to map the three required KB fields (ID, Name, and Description) to their associated Rosette Identity field types. This happens as part of the KB upload.
Let's return to the UN Sanctions list example.
Select Upload Data
Select the Browse for Files link
per.csv file, located in the Rosette Identity package
In the Items to Import section, select
per.csv. The New Mapping window is displayed.
Enter a name for the mapping in the Name field.
Select the following values on the New Mapping window, as shown above:
Select PER from the Identity Type drop down menu. You cannot upload multiple entity types from a single file, which is why the UN Sanctions list has been separated into
For the ID, Name, and Description fields, choose the following corresponding Field Types. The Field Type names are the column headers from the CSV file. In this example we are using the field names provided in the
per.csv UN Sanctions list:
KB ID: KB ID
Name: Full Name
KB Text: KB Text
Check the boxes next to the field types you've edited.
Once your custom KB has been prepared and the fields have been mapped, you can upload the KB into the system.
Click Import Source and wait while the file is processed.
The 513 entries in the
per.csv file will take a few minutes to process.
Reusing Custom Knowledge Base Mappings
You can reuse mappings when working with similar custom KBs. To demonstrate this functionality, try uploading the ORGANIZATION section of the UN Sanctions list (included in the sample data subdirectory as
When you begin the upload process, you'll notice a green arrow in the status field, which indicates that the field names identified in the file match a previously created mapping. However, because this file includes ORGANIZATION-type entities, not PERSON-types, you'll want to adjust the mapping to reflect the Identity type.
To adjust the field mapping:
org.csv in the Items to Import list.
Select the Clone Mapping icon from the right hand side of the mappings.
Change the name to reflect the new KB.
Change the Identity Type to ORG. In this example, the rest of the mapping configuration stays the same.
Select Import Source to import the
The file should take a couple of minutes to process.