Select the Search tab from the navigation bar to view a list of all uploaded indices and options for each index. You can initiate a search from any uploaded index from this page.
The options for each index are:
-
Search: Returns all matching records from the index for a single query. The query can include one or more fields.
-
Batch Search: Performs multiple queries based on an uploaded file, using each record in the file as the search terms against the index.
-
Configure: Allows you to edit the match configuration, window size, search pane display, and results pane display for the index.
-
Delete: Removes the index from Match Studio.
Select New Index to import data for a new recordset.
Each Rosette Match Studio query is processed in two passes to provide the best combination of speed and accuracy.
-
The first pass is designed to quickly generate a set of candidates for the second pass to consider.
-
The second pass compares every value returned by the first pass against the value in the query and computes a similarity score. Multiple scorers are applied in the second pass, to generate the best possible score.
The first pass gives the system the speed necessary for high-transaction environments, eliminating values in the index from consideration. The slower second pass re-compares each selected value directly in their original script using enhanced scoring algorithms.
The scores from the first pass are discarded and the match candidates are re-ranked according to the similarity scores returned by the second pass. The match scores for all search terms are combined to generate a match score. All entries with a match score equal to or greater than the display threshold are displayed in a list. Those values that are equal to or greater than the match threshold will be highlighted with their match score.
The number of scores moved along to the second pass is determined by the Window Size setting. See Configure for more information on adjusting this setting.
Before searching with Rosette Match Studio, you must create an index by uploading a recordset containing your searchable data.
Rosette Match Studio imports structured data. Supported file formats are:
-
csv
-
tsv
-
xml
-
JSON
-
delimited text file
For .csv files, the first row must be a header row, containing the names of the fields in the source file. For other file types the key names are the field names. The field names must be unique. The RMS upload does not currently support nested fields available in Elasticsearch
To create a new index:
-
Select the Search tab from the navigation bar.
-
Select New Index.
-
Follow the instructions on the New Index Wizard.
As part of the the process of creating a new index, you will have to map fields to the columns in the recordset. For more information, see Mapping.
Mapping is the process of assigning data types, or fields, to the columns in your dataset. Each column must have a field type assigned to it.
For each column in the input file:
-
The Column Name is taken from either the first row of the file (column headers) or the key values of the file, depending on the data file format.
-
Choose the Display Name, which is how the field will appear in the search pane and results pane.
-
Select the Data Type from the drop-down. If you do not want to import a column, leave the default Do Not Import.
The following data types are predefined in RMS and can be selected in the mapping definition.
Table 1. Data Types
Data Type
|
Description
|
RNI_PERSON_NAME
|
The name, nickname, or alias of an individual.
|
RNI_ORGANIZATION_NAME
|
The name of a corporation, institution, government agency, or other group of people defined by an established organizational structure.
|
RNI_LOCATION_NAME
|
The name of a geographic location such as a city, state, country, region, mountain, park, lake, or address.
|
RNI_DATE
|
A date contains a year, month, and day. All common delimiters for English dates are supported. Dates can be expressed in various orderings, and months can be written as a numeral, their full English name, or the common three-letter abbreviation.
|
RNI_ADDRESS
|
A postal address of a location.
|
KEYWORD
|
Structured content such as an ID, email address, or zip code.
|
TEXT
|
Unstructured full-text content such as a description.
|
INTEGER
|
A signed 32-bit integer.
|
DOUBLE
|
A double-precision 64-bit IEEE 754 floating point number, restricted to finite values.
|
FLOAT
|
A single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
|
BOOL
|
Boolean, true or false.
|
LONG
|
A signed 64-bit integer.
|
SHORT
|
A signed 16-bit integer.
|
Note
Before searching, you must have created an index by importing data to search against.
Search returns a list of records from your index which are potential matches for a query, as determined by the calculated match score.
To perform a search:
-
Select the Search tab from the navigation bar.
-
Select Search for the index you want to search in.
-
Enter one or more values into the search fields.
-
You can enter partial names or an initials.
-
You can enter partial dates in date fields. 1955-12-30, 1955--03, 12/30, -12, 1955 are all supported date formats.
-
Select Run Search.
Scores higher than the display threshold are shown. They are listed in descending order by match score. Results with match values greater than the match threshold are highlighted in green.
For each result:
-
Select the plus icon to expand a result for more detail.
-
Select
under Action to go to the Compare page, which shows how the match scores for a result were calculated. The name fields are preloaded with the searched and selected names. You can also use advanced settings to modify match parameters and see the resulting change in match scores.
Note
Before searching, you must have created an index by importing data to search against.
You must also have a query data file, similar to an index, that includes the data for each record (represented by rows in .csv) and field (represented by columns in .csv) you want to search.
Batch search performs multiple searches in a single task, using each record in a file as the query value against the index.
To perform a batch search:
-
Select the Search tab from the navigation bar.
-
Select Batch Search for the index you want to search in.
-
Select Upload New Batch.
-
Select or drag the file with your query data on the Upload Query Data page. This file should be a dataset of records similar to the index records.
-
Assign fields types to columns on the Map Data Fields page. You can also configure the window size match threshold, display threshold, and match configuration on this page.
-
Select Next. Match Studio displays the batch search results.
For each value in the batch file, Batch Search displays the search term from the file. To see the matching records from the index, select the plus sign next to a name. This will show the top results with a match score above the display threshold. All match scores above the match threshold will be highlighted in green.
To initiate a new batch search with another dataset, select Upload New Batch.
To view the results from another set, select that set's box in the Query Data list.
Tip
Get the best performance by following a few simple guidelines:
-
Smaller searches will get faster results. You can process a larger data set, but it will take longer to process.
-
Select Export above the results list to download a .csv
file with the match results for each record in the batch file. This allows you to easily review and analyze the results of a batch search.
The Configure section allows you to change how your data can be searched and how the results are displayed.
To access the Configure section:
-
Select the Search tab from the navigation bar.
-
Select Configure next to the desired index.
You can also access this section by selecting the gear icon beneath the search pane.
The Data tab displays information about the index, such as when it was created and which file was used to create it. This can be useful if you need more information to distinguish similar indices.
To rename the index, select the pencil icon to the right of the index name.
This function allows you to add new entries to an existing index. The source file for the new data must contain all the imported columns from the original index, with identical field headers. Additional columns will be ignored.
The search pane is the area to the left of the results list that allows you to enter search terms for each field. This tab allows you to control which search fields are displayed, how they are displayed, and how much they weigh when match scores are calculated.
Control whether the field appears in the search pane. When display is disabled for a field, that field's weight is reduced to 0 for both search and batch search.
Choose how the field name appears in both the search pane and results pane. Changing this setting via the Search Pane tab also changes it in the Results Pane tab, and vice versa.
Click and drag the icon in this column to rearrange the order in which the fields are displayed.
A field's weight value represents the magnitude of its impact on the final match score. When determining a match, some fields are more important than others. For example, the person name is likely more important in determining a match between two people than the location name. Adjust the weight slider for each field based on its relative importance.
Weight is distributed equally among all fields by default. If a field is missing from a record, that field is ignored and its weight evenly distributed across other fields.
Individual name tokens are scored by a number of algorithms. These algorithms can be optimized by modifying configuration parameters, thus changing the final match score.
A match configuration contains a set of parameters. Each named match configuration contains parameter values for a specific language pair and entity type. A single named match configuration can contain multiple language pairs and entity types.
Use the Match Configuration dropdown menu to set the default match configuration for search and batch search. You can use the default configuration (RMS-<version> Default
) or create a new match configuration. See New Match Configuration for more information on creating a new match configuration.
New match configurations can also be selected directly from the search pane. Display of this value on the search pane is disabled by default. Use the toggle in the Display column to enable display.
Each Rosette Match Studio query is processed in two passes to provide the best combination of speed and accuracy.
-
The first pass is designed to quickly generate a set of candidates for the second pass to consider.
-
The second pass compares every value returned by the first pass against the value in the query and computes a similarity score. Multiple scorers are applied in the second pass, to generate the best possible score.
Window size determines the number of scores moved along to the second pass. Increasing window size improves recall, but results in a slower search.
Window size can also be adjusted directly from the search pane. Display of this value on the search pane is disabled by default. Use the toggle in the Display column to enable display.
The results pane is where you see the results of your search or batch search. Use this tab to control which and how much information you see when you perform a search. This is particularly useful when working with large indices that have many fields. These settings also apply to exported batch search results.
Control whether the field appears in the search results. This does not affect field weight. Match score is always displayed.
Choose how the field name appears in both the search pane and results pane. Changing this setting via the Search Pane tab also changes it in the Results Pane tab, and vice versa.
This setting controls how Match Studio displays long search results in the results pane. There are three options:
-
TRUNCATE: Shortens the text at the end so that it fits the field. Hover over the ellipses at the end of truncated results to see the full text.
-
WRAP: Wraps long text by displaying it on multiple lines.
-
EXPAND: Increases the width of the field so that it fits the longest displayed search result.
Click and drag the icon in this column to rearrange the order in which the fields are displayed.
Match score is always displayed first.
Once the match score is calculated for all values in the index, only those with scores greater than or equal to the display threshold are returned in the search results. If you aren't seeing results you expect, try lowering the display threshold value to return more results.
Once the match score is calculated for all values in the index, those with scores greater than or equal to the match threshold are highlighted in the search results.