Before using Rosette Match Studio for searching, you must create the index by uploading a recordset containing your searchable data. If you upload multiple data files, they are concatenated into a single index of data to be searched.
Each record must contain at least one field which is a name. The name can be a person name, organization name, or location name.
Rosette Match Studio imports structured data. Supported file formats are:
For .csv files, the first row must be a header row, containing the names of the fields in the source file. For other file types the key names are the field names. The field names must be unique. The RMS upload does not currently support Elasticsearch nested fields.
Tip
Clear Data
At the bottom of the Upload Recordset window is the Reset System button. This will clear all imported data from your application.
To upload data:
Select Upload Recordset
in the Global Navigation bar.
Drag or browse for a source file to import.
The file name will appear in the Items to Import list.
The Mapping window is displayed.
Mapping is where you assign fields to the columns in your dataset.
Once you select the input file, Rosette Match Studio tries to find a mapping definition matching the fields in the file. If a matching mapping is not found, the field mappings must be defined. Only searchable fields need to have a mapping defined for them.
-
For each of the fields in the input file:
The Column Header name is taken from the input file. It is either the first row of the file (column headers) or the key values of the file, depending on the data file format.
Select the Field type from the drop-down for each column you want to import and search on. To not import a column, leave the default, Do Not Import.
Tip
If you want to use custom fields in your mapping, add them before importing your source files.
Save the mapping.
Import Source to import the source file into Rosette Match Studio.
If a mapping definition is found, you can Edit it if it doesn't fit the current file. For example, you could have 2 files with lists of names such that:
Both lists may have the column header of Name. When you upload File 1, you select Person Name for the Field. Then, when you upload File 2, it will try and reuse the mapping. Select Edit and change the field to Company Name. Save the changes and Import Source.
The following field types are predefined in Rosette Match Studio and can be selected in the mapping definition.
Table 2. Field Types
Field Type |
Data Type |
Entity Type |
Description |
Match Score Algorithm |
Person Name |
rni_name |
PERSON |
The name, nickname, or alias of an individual. |
Name matching algorithms |
Company Name |
rni_name |
ORGANIZATION |
The name of a corporation, institution, government agency, or other group of people defined by an established organizational structure. |
Name matching algorithms |
Location |
rni_name |
LOCATION |
The name of a geographic location such as a city, state, country, region, mountain, park, lake, or address. |
Name matching algorithms |
Date |
rni_date |
|
A date contains a year, month, and day. All common delimiters for English dates are supported. Dates can be expressed in various orderings, and months can be written as a numeral, their full English name, or the common three-letter abbreviation. |
Mathematical difference between two dates |
Age |
integer |
|
Age in years |
Mathematical difference between the numbers |
Gender |
keyword |
|
The search string is either an exact match to a value in the index, or doesn't match at all. |
0: no match
1: exact match
|
Id |
text |
|
Unstructured text content. Can be used for phone number, social security number, passport, etc. |
Edit distance |
Description |
text |
|
Unstructured text field |
Not used in matching |
The following additional data types can be used to define new fields.
Table 3. Additional Data Types
Data Type |
Description |
long |
A signed 64-bit integer. |
short |
A signed 16-bit integer. |
double |
A double-precision 64-bit IEEE 754 floating point number, restricted to finite values. |
float |
A single-precision 32-bit IEEE 754 floating point number, restricted to finite values. |
boolean |
true or false |
geo_point |
latitude-longitude pair |