Rosette Server uses ISO 639-3 codes to specify languages.
Rosette Server can support multiple profiles, each with different data domains (such as user dictionaries, regular expressions files, and custom models) as well as different parameter and configuration settings. Each profile is defined by its own root directory, thus any data or configuration files that live in the root directory of an endpoint can be part of a custom profile.
Using custom profiles, a single endpoint can simultaneously support users with different processing requirements within a single instance of Rosette Server. For example, one user may work with product reviews and have a custom sentiment analysis model they want to use, while another user works with news articles and wants to use the default sentiment analysis model.
Each unique profile in Rosette Server is identified by a string, profileId
. The profile is specified when calling the API, by adding the profileId
parameter, indicating the set of configuration and data files to be used for that call.
Custom profiles and their associated data are contained in a <profile-data-root>
directory. This directory can be anywhere in your environment; it does not have to be in the Rosette Server install directory.
Table 11. Examples of types of customizable data by endpoint
Endpoint |
Applicable data files for custom profile |
/categories |
Custom models |
/entities |
Gazetteers, regular expression files, custom models, linking knowledge base |
/morphology |
User dictionaries |
/sentiment |
Custom models |
/tokens |
Custom tokenization dictionaries |
Note
Custom profiles are not currently supported for the address-similarity
, name-deduplication
, name-similarity
, and name-translation
endpoints.
Setting up Custom Profiles
-
Create a directory to contain the configuration and data files for the custom profile.
The directory can have any name and can be anywhere on your server; it does not have to be in the Rosette Server directory structure. This is the profile-data-root
.
-
Create a subdirectory for each profile, identified by a profileId.
For each profile, create a subdirectory named profileID in the profile-data-root. The profile-path for a project is profile-data-root/profileId
.
-
Edit the Rosette Server configuration files to look for the profile directories.
The configuration files are in the launcher/config/
directory. Set the profile-data-root
value in these files:
# profile data root folder that may contain profile-id/{rex,tcat} etc
profile-data-root=file:///Users/rosette-users
-
Add the customization files for each profile. They may be configuration and/or data files.
When you call the API, add "profileId" = "myProfileId"
to the body of the call.
{"content": "The black bear fought the white tiger at London Zoo.",
"profileId": "group1"
}
New profiles are automatically loaded in Rosette Server. You do not have to bring down or restart the instance to add new models or data to Rosette Server.
To add or update models or data, assuming the custom profile root rosette-users
and profiles group1
and group2
.
Add a new profile with the new models or new data, for example group3
.
Delete the profile and re-add it. Delete group1
and then recreate the group1
directory with the new models and/or data.
The configurations for each endpoint are contained in the factory configuration files. The worker-config.yaml
file describes which factory configuration files are used by each endpoint as well as the pipelines for each endpoint. To modify default parameter values or any other configuration values, copy the factory configuration file into the profile path and modify the values.
Example 2. Modifying entities parameters default values
Let's go back to our example with profile-ids of group1 and group2. Group1 wants to modify the default entities parameters, setting entity linking to true
and case sensitivity to false
. These parameters are set in the rex-factory-config.yaml
file.
Copy the file /config/rosapi/rex-factory-config.yaml
to rosette-users/group1/config/rosapi/rex-factory-config.yaml
.
-
Edit the new rex-factory-config.yaml
file as needed. This is an excerpt from a sample file.
# rootDirectory is the location of the rex root
rootDirectory: ${rex-root}
# startingWithDefaultConfigurations sets whether to fill in the defaults with CreateDefaultExtrator
startingWithDefaultConfiguration: true
# calculateConfidence turns on confidence calculation
# values: true | false
calculateConfidence: true
# resolvePronouns turns on pronoun resolution
# values: true | false
resolvePronouns: true
# rblRootDirectory is the location of the rbl root
rblRootDirectory: ${rex-root}/rbl-je
# case sensitivity model defaults to auto
caseSensitivity: automatic
# linkEntities is default true for the Cloud
linkEntities: true
Each profile can include custom data sets. For example, the entities endpoint includes multiple types of data files including regex and gazetteers.
Example 3. Custom regex for the Entities Endpoint
The custom regex file used in this example is named custo-regexes.xml
. It is assumed that you have already created the custom regex file as described in the section Supplemental Regexes in the Rosette Entity Extractor Application Developer Guide.
Copy the file /config/rosapi/rex-factory-config.yaml
to rosette-users/group1/config/rosapi/rex-factory-config.yaml
.
-
Edit the new rex-factory-config.yaml
file, setting the dataOverlayDirectory
and adding a supplemental regex.
# rootDirectory is the location of the rex root
rootDirectory: ${rex-root}
dataOverlayDirectory: "/Users/rosette-users/group1/rex/data"
supplementalRegularExpressionPaths:
- "/Users/rosette-users/group1/rex/data/regex/eng/accept/supplemental/custom-regexes.xml"
Add the file custom-regexes.xml
to the directory Users/rosette-users/group1/rex/data/regex/eng/accept/supplemental
. This file contains the new regex expressions.
In this example we're going to add the entity types COLORS and ANIMALS to the entities endpoint, using a regex file.
Create a profile-data-root
, called rosette-users in the Users
directory.
-
Create a user with the profileId
of user1. The new profile-path
is:
/Users/rosette-users/user1
-
Edit the Rosette Server configuration files:
adding the profile-data-root.
# profile data root folder that may contain app-id/profile-id/{rex,tcat} etc
profile-data-root=file:///Users/rosette-users
-
Copy the rex-factory-config.yaml
file from /config/rosapi
into the new directory:
/Users/rosette-users/user1/config/rosapi/rex-factory-config.yaml
-
Edit the copied file, setting the dataOverlayDirectory
parameter and adding the path for the new regex file. The overlay directory is a directory shaped like the data
directory. The entities endpoint will look for files in both locations, preferring the version in the overlap directory.
dataOverlayDirectory: "/Users/rosette-users/user-1/custom-rex/data"
supplementalRegularExpressionPaths:
- "/Users/rosette-users/user1/custom-rex/data/regex/eng/accept/supplemental/custom-regexes.xml"
-
Create the file custom-regexes.xml
in the /Users/rosette-users/user1/custom-rex/data/regex/eng/accept/supplemental
directory.
<regexps>
<regexp type="COLOR">(?i)red|white|blue|black</regexp>
<regexp type="ANIMAL">(?i)bear|tiger|whale</regexp>
</regexps>
-
Call the entities endpoint without using the custom profile:
curl -s -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "The black bear fought the white tiger at London Zoo." }' \
"http://localhost:8181/rest/v1/entities"
The only entity returned is London Zoo:
{
"entities": [
{
"type": "LOCATION",
"mention": "London Zoo",
"normalized": "London Zoo",
"count": 1,
"mentionOffsets": [
{
"startOffset": 41,
"endOffset": 51
}
],
"entityId": "T0"
}
]
}
-
Call the entities endpoint, adding the profileId to the call:
curl -s -X POST \ -H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "The black bear fought the white tiger at London Zoo.",
"profileId": "zookeeper"}' \
"http://localhost:8181/rest/v1/entities"
The new colors and animals are also returned:
"entities": [
{
"type": "COLOR",
"mention": "black",
"normalized": "black",
"count": 1,
"mentionOffsets": [
{
"startOffset": 4,
"endOffset": 9
}
],
"entityId": "T0"
},
{
"type": "ANIMAL",
"mention": "bear",
"normalized": "bear",
"count": 1,
"mentionOffsets": [
{
"startOffset": 10,
"endOffset": 14
}
],
"entityId": "T1"
},
{
"type": "COLOR",
"mention": "white",
"normalized": "white",
"count": 1,
"mentionOffsets": [
{
"startOffset": 26,
"endOffset": 31
}
],
"entityId": "T2"
},
{
"type": "ANIMAL",
"mention": "tiger",
"normalized": "tiger",
"count": 1,
"mentionOffsets": [
{
"startOffset": 32,
"endOffset": 37
}
],
"entityId": "T3"
},
{
"type": "LOCATION",
"mention": "London Zoo",
"normalized": "London Zoo",
"count": 1,
"mentionOffsets": [
{
"startOffset": 41,
"endOffset": 51
}
],
"entityId": "T4"
}
Important
This feature is in LABS and is subject to change.
Usage tracking provides metrics on all Rosette server calls. Call counts are provided by app-id, profileId, endpoint, and language.
Application ids (app-id
) are an optional way to identify the application or group making the call. The app-id
is the value of X-RosetteAPI-App-ID
in the call header. If no application id is provided in the call header, the calls are allocated to the no-app-id
group.
Profile ids (profileId
) are an optional way of identifying a custom profile. Each profile can have its own data domain, parameter, and configuration settings. If no profile id is provided in the call, the calls are allocated to the no-profile-id
group.
Language is identified by the 3-letter ISO 639-3 language code. xxx
indicates the language was unknown.
Calls made to the endpoints /rest/v1/info
, /rest/v1/ping
, and /rest/v1/custom
are not included in the statistics.
Call statistics are kept in the file launcher/config/rosette-usage.yaml
. The statistics are cumulative from the file creation date. The file is created when the server is started. If the file already exists when the server is started, new statistics are added to the existing file. The file is not deleted when the server is stopped.
Usage
To access the statistics, call the usage
endpoint:
curl http://localhost:8181/rest/usage
where localhost:8181
is the location of the Rosette installation.
Sample Response
{"no-app-id": {
"no-profile-id": {
"/rest/v1/tokens": {
"eng": {
"calls": 1
},
"zho": {
"calls": 1
}
},
"/rest/v1/categories": {
"eng": {
"calls": 1
}
},
"/rest/v1/language": {
"xxx": {
"calls": 1
}
}
}
}
}
You can also aggregate usage data by having Prometheus pull metrics from multiple instances using the usage/metrics
endpoint. A single call returns all endpoints.
curl http://localhost:8181/rest/usage/metrics
where localhost:8181
is the location of the Rosette installation.
Sample Response
# HELP rosette_http_requests_total Total number of Rosette Enterprise requests processed.
# TYPE rosette_http_requests_total counter
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/tokens",lang="zho",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/semantics/vector",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/morphology/compound-components",lang="deu",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/syntax/dependencies",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/morphology/lemmas",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/topics",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/transliteration",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/sentences",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/address-similarity",lang="xxx",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/name-deduplication",lang="xxx",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/morphology/complete",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/entities",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/name-translation",lang="xxx",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/morphology/parts-of-speech",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/semantics/similar",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/language",lang="xxx",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/sentiment",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/categories",lang="eng",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/name-similarity",lang="xxx",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/morphology/han-readings",lang="zho",} 1.0
rosette_http_requests_total{app_id="no-app-id",profile_id="no-profile-id",
endpoint="/rest/v1/relationships",lang="eng",} 1.0
The configuration parameters for usage tracking is in the file launcher/config/com.basistech.ws.local.usage.tracker.cfg
.
-
Disable Tracking By default, usage tracking is turned on. To disable tracking, uncomment the enabled
parameter and change the value to false:
enabled: false
-
Report interval To set the reporting interval in minutes, change the reportInterval
parameter. The default is 1 minute.
reportInterval: 2
-
File Location To set the location for the rosette-usage.yaml
file set the usage-tracker-root
parameter. The default location is <rosette>/server/launch/config
. Uncomment the line and change it to your preferred location. This example changes it to the /var/log
directory:
usage-tracker-root: /var/log
To reset the counter:
Stop the server
-
Remove the following files:
Restart the server
Identifying an Application
No authorization is required when using an on-premise installation of Rosette. You may, however, want to track Rosette calls by groups within your organization. To do this, include an application id (app-id
) in the request header of all Rosette calls. This allows Rosette to track usage by app-id
.
An application id is:
A user-defined string.
It is defined in the call.
There is no validation or authorization on the value.
Used for usage tracking only.
If no app-id
is included in the header, calls are allocated to the no-app-id
group.
Example:
curl -s -X POST \
-H "X-RosetteAPI-App-Id: usergroup1" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "Por favor Señorita, says the man." }' \
"https://localhost:8181/rest/v1/language"
Advanced Configuration Options
The following sections describe custom installation configurations and will not apply to all installs.
Configure Worker Threads for HTTP Transport
Multiple worker threads allow you to implement parallel request processing. Generally, we recommend that the number of threads should be less than the number of physical cores or less than the total number of hyperthreads, if enabled.
You can experiment with 2-4 worker threads per core. More worker threads may improve throughput a bit, but generally won't improve latency. The default value of worker threads is 2.
On macOS/Linux or Windows:
Edit the file /config/com.basistech.ws.worker.cfg
Modify the value of workerThreadCount
With Docker:
Edit the file docker-compose.yaml
Modify the value of ROSETTE_WORKER_THREADS
For local:
transport, configure the workerThreadCount
in /config/com.basistech.ws.transport.embedded.cfg
. Modify the value of workerThreadCount
.
To speed up first call response time, Rosette can be pre-warmed by loading data files at startup at the cost of a larger memory footprint.
Most components load their data lazily, meaning that the data required for processing will only be loaded into memory when an actual call hits. This is particularly true for language-specific data. The consequence is that when the very first call with text in a given language arrives at a worker, the worker can take a quite a bit of time loading data before it can process the request.
Pre-warming is Rosette's attempt to address the 1st-call penalty by hitting the worker with text in every licensed language it supports at boot time. Then, when an actual customer request comes in, all data will have already been memory mapped and you won't experience a first call delay as the data is loaded. Only languages licensed for your installation will be pre-warmed.
The default is set to false
, pre-warm is not enabled.
To set Rosette to warm up the worker upon activation
On macOS/Linux or Windows:
Edit the file /com.basistech.ws.worker.cfg
set warmUpWorker=true
Tip
When installing on macOS or Linux, Rosette can be set to pre-warm in the installation. Select Y
when asked Pre-warm Rosette at startup?
You can always change the option by editing the com.basistech.ws.worker.cfg
file.
With Docker:
Edit the file docker-compose.yaml
Set ROSETTE_PRE_WARM=true
Modify the Input Constraints
The limits for the input parameters are in the file /rosapi/constraints.yaml
. Modify the values in this file to increase the limits on the maximum input character count and maximum input payload per call. You can also increase the number of names per list for each call to the name deduplication endpoint.
The default values were determined as optimal during early rounds of performance tests targeting < 2 second response times. Larger values may cause degradation of system performance.
Table 12. constraints.yaml
Parameter |
Minimum |
Maximum |
Default Value |
Description |
maxInputRawByteSize |
1 |
10,000,000 |
614400 |
The maximum number of input bytes per raw doc |
maxInputRawTextSize |
1 |
1,000,000 |
50000 |
The maximum number of input characters per submission |
maxNameDedupeListSize |
1 |
100,000 |
1000 |
The maximum number of names to be deduplicated. |
To modify the input constraints:
Edit the file /rosapi/constraints.yaml
Modify the value for one or more parameters
Enable Passing Files to Endpoints
Most endpoints can take either a text block, a file, or a link to a webpage as the input text. The webpage link is in the form of a URI. To enable passing a URI to an endpoint, the enableDTE
flag must be set in the file com.basistech.ws.worker.cfg
.
By default, the flag is set to True
; URI passing is enabled.
#download and text
extractorenableDte=true