Rosette Server can support multiple profiles, each with different data domains (such as user dictionaries, regular expressions files, and custom models) as well as different parameter and configuration settings. Each profile is defined by its own root directory, thus any data or configuration files that live in the root directory of an endpoint can be part of a custom profile.
Using custom profiles, a single endpoint can simultaneously support users with different processing requirements within a single instance of Rosette Server. For example, one user may work with product reviews and have a custom sentiment analysis model they want to use, while another user works with news articles and wants to use the default sentiment analysis model.
Each unique profile in Rosette Server is identified by a string, profileId
. The profile is specified when calling the API, by adding the profileId
parameter, indicating the set of configuration and data files to be used for that call.
Custom profiles and their associated data are contained in a <profile-data-root>
directory. This directory can be anywhere in your environment; it does not have to be in the Rosette Server install directory.
Table 6. Examples of types of customizable data by endpoint
Endpoint
|
Applicable data files for custom profile
|
/categories
|
Custom models
|
/entities
|
Gazetteers, regular expression files, custom models, linking knowledge base
|
/morphology
|
User dictionaries
|
/sentiment
|
Custom models
|
/tokens
|
Custom tokenization dictionaries
|
Note
Custom profiles are not currently supported for the address-similarity
, name-deduplication
, name-similarity
, and name-translation
endpoints.
Setting up custom profiles
-
Create a directory to contain the configuration and data files for the custom profile.
The directory name must be 1 or more characters consisting of 0-9
, A-Z
, a-z
, underscore or hyphen and no more than 80 characters long. It cannot contain spaces. It can be anywhere on your server; it does not have to be in the Rosette Server directory structure. This is the profile-data-root
.
-
Create a subdirectory for each profile, identified by a profileId.
For each profile, create a subdirectory named profileID in the profile-data-root. The profile-path for a project is profile-data-root/profileId
.
-
Edit the Rosette Server configuration files to look for the profile directories.
The configuration files are in the launcher/config/
directory. Set the profile-data-root
value in these files:
# profile data root folder that may contain profile-id/{rex,tcat} etc
profile-data-root=file:///Users/rosette-users
-
Add the customization files for each profile. They may be configuration and/or data files.
When you call the API, add "profileId" = "myProfileId"
to the body of the call.
{"content": "The black bear fought the white tiger at London Zoo.",
"profileId": "group1"
}
https://localhost:8181/rest/v1/custom-profiles
The /custom-profiles
endpoint returns a list of all custom profiles on the server.
curl -s http://localhost:8181/rest/v1/custom-profiles
If the call includes an app-id
in the request header, the custom-profiles endpoint returns all profiles under the specified app-id
.
curl -s http://localhost:8181/rest/v1/custom-profiles -H "X-RosetteAPI-App-Id: app-id"
New profiles are automatically loaded in Rosette Server. You do not have to bring down or restart the instance to add new models or data to Rosette Server.
When editing an existing profile, you may need to restart Rosette Server. If the profile has been called since Rosette Server was started, the Server must be restarted for the changes to take effect. If the profile has not been called since Rosette Server was started, there is no need to restart.
To add or update models or data, assuming the custom profile root rosette-users
and profiles group1
and group2
.
-
Add a new profile with the new models or new data, for example group3
.
-
Delete the profile and re-add it. Delete group1
and then recreate the group1
directory with the new models and/or data.
The configurations for each endpoint are contained in the factory configuration files. The worker-config.yaml
file describes which factory configuration files are used by each endpoint as well as the pipelines for each endpoint. To modify parameter values or any other configuration values, copy the factory configuration file into the profile path and modify the values.
Example 3. Modifying entities parameters default values
Let's go back to our example with profile-ids of group1 and group2. Group1 wants to modify the default entities parameters, setting entity linking to true
and case sensitivity to false
. These parameters are set in the rex-factory-config.yaml
file.
-
Copy the file /launcher/config/rosapi/rex-factory-config.yaml
to rosette-users/group1/config/rosapi/rex-factory-config.yaml
.
-
Edit the new rex-factory-config.yaml
file as needed. This is an excerpt from a sample file.
# rootDirectory is the location of the rex root
rootDirectory: ${rex-root}
# startingWithDefaultConfigurations sets whether to fill in the defaults with CreateDefaultExtrator
startingWithDefaultConfiguration: true
# calculateConfidence turns on confidence calculation
# values: true | false
calculateConfidence: true
# resolvePronouns turns on pronoun resolution
# values: true | false
resolvePronouns: true
# rblRootDirectory is the location of the rbl root
rblRootDirectory: ${rex-root}/rbl-je
# case sensitivity model defaults to auto
caseSensitivity: false
# linkEntities is default true for the Cloud
linkEntities: true
Each profile can include custom data sets. For example, the entities endpoint includes multiple types of data files, including regex and gazetteers. These files can be put into their own directory for entities, known as an overlay directory. This is an additional data directory which takes priority over the default entities data directory.
Note
If the data overlay directory is named rex, the contents of the overlay directory will completely replace all supplied REX data files, including models, regex, and gazetteer files.
-
If your custom data sets are intended to supplement the shipped files, the directory name must not be rex
.
-
If your custom data sets are intended to completely replace the shipped files, use the directory name rex
.
Example 4. Custom Gazetteer for the Entities Endpoint
We will create a custom gazetteer file called custom_gaz.txt
specifying "John Doe" as an ENGINEER entity type. Full details on how to create custom gazetteer files are in the section Creating a Custom Gazetteer in the Rosette Entity Extractor Application Developer Guide.
-
Create the custom gazetteer file in /Users/rosette-users/group1/custom-rex/data/gazetteer/eng/accept/custom_gaz.txt.
It should consist of just two lines:
ENGINEER
John Doe
-
Copy the file /launcher/config/rosapi/rex-factory-config.yaml
to /Users/rosette-users/group1/config/rosapi/rex-factory-config.yaml
.
-
Edit the new rex-factory-config.yaml
file, setting the dataOverlayDirectory
.
# rootDirectory is the location of the rex root
rootDirectory: ${rex-root}
dataOverlayDirectory: "/Users/rosette-users/group1/custom-rex/data"
-
Call the entities endpoint with the profileId
set to group1
:
curl -s -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "John Doe is employed by Basis Technology", "profileId": "group1"}' \
"http://localhost:8181/rest/v1/entities"
You will see "John Doe" extracted as type ENGINEER from the custom gazetteer.
You can train and deploy a custom model to the entities endpoint for entity extraction. You can either:
-
Copy the model file to the default data directory in the REX root folder.
<RosetteServerInstallDir>/roots/rex/<version>/data/statistical/<lang>/<modelfile>
where <lang> is the 3 letter language code for the model.
-
Copy the model to the data directory of a custom profile.
<profile-data-root>/<profileId>/data/statistical/<lang>/<modelfile>
where <lang> is the 3 letter language code for the model.
The custom profile must be set up as described in Setting up custom profiles
Tip
Model Naming Convention
The prefix must be model.
and the suffix must be -LE.bin
. Any alphanumeric ASCII characters are allowed in between.
Example valid model names:
-
model.fruit-LE.bin
-
model.customer4-LE.bin
In this example, we're going to add the entity types COLORS and ANIMALS to the entities endpoint, using a regex file.
-
Create a profile-data-root
, called rosette-users in the Users
directory.
-
Create a user with the profileId
of group1. The new profile-path
is:
/Users/rosette-users/group1
-
Edit the Rosette Server configuration files:
adding the profile-data-root.
# profile data root folder that may contain app-id/profile-id/{rex,tcat} etc
profile-data-root=file:///Users/rosette-users
-
Copy the rex-factory-config.yaml
file from /launcher/config/rosapi
into the new directory:
/Users/rosette-users/group1/config/rosapi/rex-factory-config.yaml
-
Edit the copied file, setting the dataOverlayDirectory
parameter and adding the path for the new regex file. The overlay directory is a directory shaped like the data
directory. The entities endpoint will look for files in both locations, preferring the version in the overlap directory.
dataOverlayDirectory: "/Users/rosette-users/group1/custom-rex/data"
supplementalRegularExpressionPaths:
- "/Users/rosette-users/group1/custom-rex/data/regex/eng/accept/supplemental/custom-regexes.xml"
-
Create the file custom-regexes.xml
in the /Users/rosette-users/group1/custom-rex/data/regex/eng/accept/supplemental
directory.
<regexps>
<regexp type="COLOR">(?i)red|white|blue|black</regexp>
<regexp type="ANIMAL">(?i)bear|tiger|whale</regexp>
</regexps>
-
Call the entities endpoint without using the custom profile:
curl -s -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "The black bear fought the white tiger at London Zoo." }' \
"http://localhost:8181/rest/v1/entities"
The only entity returned is London Zoo:
{
"entities": [
{
"type": "LOCATION",
"mention": "London Zoo",
"normalized": "London Zoo",
"count": 1,
"mentionOffsets": [
{
"startOffset": 41,
"endOffset": 51
}
],
"entityId": "T0"
}
]
}
-
Call the entities endpoint, adding the profileId to the call:
curl -s -X POST \ -H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "The black bear fought the white tiger at London Zoo.",
"profileId": "group1"}' \
"http://localhost:8181/rest/v1/entities"
The new colors and animals are also returned:
"entities": [
{
"type": "COLOR",
"mention": "black",
"normalized": "black",
"count": 1,
"mentionOffsets": [
{
"startOffset": 4,
"endOffset": 9
}
],
"entityId": "T0"
},
{
"type": "ANIMAL",
"mention": "bear",
"normalized": "bear",
"count": 1,
"mentionOffsets": [
{
"startOffset": 10,
"endOffset": 14
}
],
"entityId": "T1"
},
{
"type": "COLOR",
"mention": "white",
"normalized": "white",
"count": 1,
"mentionOffsets": [
{
"startOffset": 26,
"endOffset": 31
}
],
"entityId": "T2"
},
{
"type": "ANIMAL",
"mention": "tiger",
"normalized": "tiger",
"count": 1,
"mentionOffsets": [
{
"startOffset": 32,
"endOffset": 37
}
],
"entityId": "T3"
},
{
"type": "LOCATION",
"mention": "London Zoo",
"normalized": "London Zoo",
"count": 1,
"mentionOffsets": [
{
"startOffset": 41,
"endOffset": 51
}
],
"entityId": "T4"
}