The full training installation consists of the following components:
-
Rosette Server (RS), including Rosette Entity Extractor (REX)
-
Rosette Adaptation Studio (RAS)
-
REX Training Server (RTS)
-
Event Training Server (ETS)
An installation of Rosette Model Training Suite may include only one of the training servers.
The components can be installed on separate machines or all together on a single machine. One machine is adequate for light loads and configuration testing. For production work, large projects, or multiple projects, we recommend installing on multiple machines.
For either install, you will need to know the fully qualified host name where a component is installed. The training servers (RTS and ETS) can be installed on the same server. For the three machine install, you will need the three host names. For a single machine install, you only need the one name.
Important
For all Docker installations, localhost
is not an acceptable name; the hostname must be addressable from within the Docker containers. Rosette Server is not installed as a Docker container.
To find the host name for a machine, run the command hostname -f
on the machine.
Docker compose configuration
When you extract the zip files, each server directory will contain the following 2 files for Docker:
-
docker-compose.yml
-
.env
Tip
The .env
file is a hidden file. All file names that start with a . are hidden. Type ls -a
to list the hidden files along with the other files in the directory.
The directories used to connect the components, as shown in the figure below, are defined in the .env
for each product. To view or change a value, edit the .env
file, not the docker-compose.yml
file.
Example .env
file for RTS
RTS_PORT=9080
# Default /basis/rts/workspaces
WORKSPACE_ROOT=$RTS_WORKSPACE_ROOT
# Default /basis/rts
# Wordclasses need to go into this directory
ASSETS_ROOT=$RTS_ASSETS
# Default /basis/rts/config
# File is mongodal_config.yaml
DAL_CONNECTOR_CONFIG_DIR=$DAL_CONNECTOR_CONFIG_DIR
# The release script will update this variable.
REX_TRAINING_SERVER_IMAGE=rex-training-server:0.4.2
# See https://www.ibm.com/support/knowledgecenter/SSD28V_liberty/com.ibm.websphere.wlp.core.doc/ae/twlp_admin_customvars.html
# for details on the contents of this file.
JVM_OPTIONS=/basis/rts/config/jvm.options
# See https://www.ibm.com/support/knowledgecenter/SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/cwlp_config.html# for details on the contents of this file.
SERVER_XML=/basis/rts/config/server.xml
# Where to store RTS logs
RTS_LOGS=$RTS_LOGS
# The maximum number of training threads at any one time
RTS_CONCURRENT_TRAIN_THREADS=2
# The maximum number of threads serializing models at any one time
RTS_CONCURRENT_SERIALIZE_THREADS=1
# The maximum number of threads creating wordclasses at any one time
RTS_CONCURRENT_WORDCLASS_THREADS=2
The variable values set in the .env
file are used in the docker-compose.yml
file:
version: '3'
services:
rex-training-server:
# https://docs.docker.com/compose/compose-file/#restart
# no, default, does not restart a container under any circumstance
# always, the container always restarts
# on-failure, restarts a container if the exit code indicates an on-failure error
# unless-stopped, always restarts a container, except when the container is stopped
# https://github.com/docker/compose/issues/3672 no must be in quotes
restart: "no"
image: ${REX_TRAINING_SERVER_IMAGE}
volumes:
- ${WORKSPACE_ROOT}:/basis/rts/workspaces
- ${ASSETS_ROOT}:/basis/rts
- ${REXJE_ROOT}:/basis/rts/root
# The file mongodal_config.yaml must exist in this directory
- ${DAL_CONNECTOR_CONFIG_DIR}:/basis/rts/config
- ${RTS_LOGS}:/logs# Optionally override JVM settings here, default -Xms8G -Xmx16G#
- ${JVM_OPTIONS}:/config/jvm.options
# Optionally override JVM settings here,default -Xms8G -Xmx16G
# - ${JVM_OPTIONS}:/config/jvm.options
# Optionally override Server settings here
# - ${SERVER_XML}:/config/server.xml
environment:
- AS_MONGO_DAL_CONNECTOR_CONFIG_DIR=/basis/rts/config
- rexje_root=/basis/rts/root
- RTS_CONCURRENT_TRAIN_THREADS=${RTS_CONCURRENT_TRAIN_THREADS}
- RTS_CONCURRENT_SERIALIZE_THREADS=${RTS_CONCURRENT_SERIALIZE_THREADS}
- RTS_CONCURRENT_WORDCLASS_THREADS=${RTS_CONCURRENT_WORDCLASS_THREADS}
ports:
- ${RTS_PORT}:9080
Specifying Service Restart Policy
The service restart policy for each service can be specified in the docker-compose.yml
files by specifying the restart
parameter. This allows containers to be restarted on server reboot, Docker service restart, etc. Restart can be one of no
, always
, on-failure
, unless-stopped
. The default is no
if not specified.
Example for RTS docker-compose.yml
:
version: '3 '
services:
rex-training-server:
restart: "no"
...
These prerequisites are for the training environment.
Important
Recommended Operating System: 64 bit Linux or macOS.
Windows deployment (including Docker Desktop for Windows) is not tested or supported at this time. Windows users using Windows 10 pro or Windows Server 2016 or 2019 should run Rosette Adaptation Studio in a Linux virtual machine under Hyper-V or VMWare Workstation.
Note
Chrome and Firefox are the supported browsers for Rosette Adaptation Studio.
Note
To import models into Rosette Adaptation Studio from the command line, the utility jq must be installed on your system.
-
You must install the files for Rosette Server, REX Training Server, Events Training Server, and Rosette Adaptation Studio in different directories or on different computers. We recommend installing the REX and Events training servers on the same machine.
-
The machines for Rosette Adaptation Studio, REX Training Server, and Events Training Server must have Docker and docker compose installed.
-
Before installing any components, create the top-level directory for all components with proper permissions on each machine.
In this example, the install directory (<installDir>) is /basis
.
sudo mkdir /basis
sudo chmod 2777 /basis
Table 2. Tested Versions
Component
|
Version
|
Docker
|
20.10.0
|
docker-compose
|
1.26.0
|
CentOS
|
7.8.2003 and 8
|
Rocky
|
9
|
Ubuntu
|
20.04
|
MacOS (Intel)
|
|
Table 3. Rosette Server System Requirements
Resource
|
Requirement
|
CPU
|
4 virtual CPU cores
|
Memory
|
32 GB
|
Disk Space
|
100 GB recommended for multiple small/medium projects. The actual amount required is determined by size and number of active projects.
|
Table 4. REX and Events Training Server System Requirements
Resource
|
Requirement
|
CPU
|
4 virtual CPU cores
|
Memory
|
32 GB
|
Disk Space
|
500 GB recommended for multiple small/medium projects. The actual amount required is determined by the size and number of active projects.
|
Table 5. Rosette Adaptation Studio System Requirements
Resource
|
Requirement
|
CPU
|
4 virtual CPU cores
|
Memory
|
16 GB
|
Disk Space
|
500 GB recommended for multiple small/medium projects. The actual amount required is determined by the size and number of active projects.
|
Single System Installation Prerequisites
On a single system, the following disk space is required for installation only. More space is needed to run the system.
Tip
If you choose auto-partitioning when installing the operating system, you may need to override the default install to ensure that /root
gets enough space. For example, some Linux installs default to 70 GB for /root
, which is not enough to install the entire system in /basis
.
The training shipment contains the following files:
-
rs-installation-<version>.zip: Files for Rosette Server. The size of the file is dependent on the number of languages included. This file may be shipped separately.
-
ets-installation-<version>.zip: Files for Events Training Server.
-
rts-installation-<version>.zip: Files for REX Training Server.
-
Files for Rosette Adaptation Studio. The file in the shipment will be one of the following, depending on the configuration shipment.
-
ras-ets-<version>.zip: Files for Rosette Adaptation Studio for event model training.
-
ras-rts-<version>.zip: Files for Rosette Adaptation Studio for entity model training.
-
ras-ets-rts-<version>.zip: Files for Rosette Adaptation Studio for event and entity model training.
-
Model-Training-Suite-Documentation-<version>.zip: Documentation files.
-
System_Administrator_Guide-en.pdf: This guide.
-
Developing_Models-en.pdf: A guide for system architects and model administrators to aid in defining the modeling strategy and understanding the theory of model training.
-
Adaptation_Studio_User_Guide-en.pdf: A guide for the managers and annotators using Rosette Adaptation Studio
-
eventTest.etsmodel: Sample ets project.
-
export_sample.zip: Exported sample project.
-
Rosette_Adaptation_Studio_Events_tutorial_1_0_x.zip: A complete tutorial for events, including sample documents
You will need the license file during installation. The license file may be shipped separately.
A log file is created as each server is installed. All install questions, responses, are logged, along with all actions taken to install the server. Actions during enable and disable SSL are also logged. The files are created in the install directory with the name:
install-<scriptname>.sh.<date>_<time>.log
where scriptname
is rs, rts, ets, or ras.
For example, an installation of Rosette Server (rs) installed on 10/12/21 at 7:59 am would create the file:
install-rs.sh.10-12-21_07-59.log
Tip
It is recommended that Rosette Server is installed stand-alone. However, Rosette Model Training Suite can support a containerized version.
Both REX Training Server (RTS) and Events Training Server (ETS) require specific Rosette Server configurations and custom profiles. After installing each component, run the provided scripts to update Rosette Server and install the required custom profiles.
The following sections include instructions for installing stand-alone or as a docker container.
The headless installer installs RS with Docker and without human interaction. Instead of user prompts, the installer parameters are taken from the properties file.
The installer prompts are defined in the file install-rs-headless.properties
:
Start the installation:
./install-rs-headless.sh
The properties file is in the same directory as the script.
Use the --dry-run
flag to validate the properties file, print the settings, and exit without changing anything.
-
You must have a recent version of Docker Engine installed
-
Docker disk image size must be increased to 120GB (from the default of 60GB) to install the full Rosette Server package.
-
If installing on Windows, Docker for Windows must be installed (not Docker Toolbox or Docker Machine) with Hyper-V enabled.
The Docker memory must be set to at least 16 GB if all endpoints are licensed and activated, and may require more depending on your application.
At a minimum, the Docker maximum memory should be the same or more than the Rosette JVM heap size. Otherwise, when running in a Docker container Rosette Server may get SIGKILL when the JVM asks for more memory the Docker allocates.
Once the Rosette files are downloaded and installed, Docker can be run without being connected to the internet. To install offline:
-
Download the component file tarballs, along with the Docker files and license, from the email containing the Rosette Server files.
-
Run import_docker_images.sh
to create the Docker volumes.
-
Run the Docker container (docker compose
) as described below.
Install and run Docker container
To download the volumes directly, you must have an internet connection.
Tip
In certain circumstances; like an installation involving all endpoints and languages, docker compose
may timeout. To avoid repeatedly executing the command until it succeeds, you can increase the default timeout (60 seconds).
In the directory containing the file docker-compose.yml
run:
echo "COMPOSE_HTTP_TIMEOUT=300" >> .env
-
Download the Docker file docker-compose.yml
and license file rosette-license.xml
. Note the location of the license file (path-to-license).
-
To run the Docker container (and download the volumes if they haven't already been downloaded):
ROSAPI_LICENSE_PATH=<path-to-license>/rosette-license.xml docker compose up
You can also provide a stack name:
ROSAPI_LICENSE_PATH=<path-to-license>/rosette-license.xml docker compose -p <stack-name> up
-
The Rosette RESTful server will be accessible on the Docker host on the port defined in the docker-compose.yml
file.
Note
If your installation includes the entity extraction component (rex-root
), you may see failed to open ...
warning messages for data files in languages not installed in your system. These can safely be ignored.
Modifying Rosette parameters in Docker
Important
To modify Rosette parameters, edit the docker-compose.yaml
file.
The following configuration options can be changed by editing the environment section of the file.
Uncomment the line for the variable and change the value.
Note
To run the entity extraction and linking, sentiment analysis, and topic extraction endpoints, the recommended value for ROSETTE_JVM_MAX_HEAP is 16GB. The default value in the file is 4 GB.
environment:
# - ROSETTE_JVM_MAX_HEAP=4 # max Java heap size in GB, default is 4, must be >=4;
# to run all endpoints the recommended minimum is 16
# - ROSETTE_WORKER_THREADS=2 # number of worker threads, default is 2, must be >=1
# - ROSETTE_PRE_WARM=false # pre-warm the server on startup, default is false,
# valid values are true|false
# - ROSETTE_DOC_HOST_PORT=localhost:8181 # hostname should be accessible on the network,
# port value should match mapped port above
You can specify your own volume, for example, backed by a different volume driver.
volumes:
# if a local volume is not desirable, change this to suit your needs
rosette-roots-vol:
The default docker configuration uses port 8181 for the Rosette endpoints. To change this, modify the ports section.
ports:
- "8181:8181"
Only the first value in the port statement should be changed. The port statement and the ROSETTE_DOC_HOST_PORT
value must match.
ports:
- "4444:8181"
environment:
- ROSETTE_DOC_HOST_PORT=localhost:4444
If you're accessing the documentation from a different machine, change local host to the documentation machine network accessible host name.
Installing stand-alone on macOS and Linux
Before installing, you must download the install script (install_rosette.sh
) and the license file (rosette-license.xml)
. The remaining files will be downloaded into the same directory as the install script.
Note
You must run the installer while connected to the internet to download the files. Once the files are downloaded, re-run the installer locally, without a connection, to resume the install.
To start the install, execute the install script from the directory in which it resides.
bash install_rosette.sh
Warning
Set the java
executable in your PATH
before starting the install. PATH
takes precedence over JAVA_HOME
.
The install will guide you through downloading the files and configuring your system. Each question has a default; you can accept the default or enter a different value.
-
The Rosette Server minimum system requirements are displayed. You can choose to continue, or if your system does not meet the minimum requirements, exit at this point.
-
If you have already downloaded the install files, there is no need to download them again. Depending on the number of roots included, this step can be very time-consuming.
-
If the license file is not in the same directory as the install script, you will be prompted for the location of the license file.
-
Enter an install directory or select Enter
to use the default install directory. The download and install directories cannot be the same. The install directory must be completely empty before installing.
Both the headless and interactive installers will update Rosette Server as required. If you have an existing installation of Rosette Server, or you didn't choose to update it on install, you can manually run the scripts to update Rosette Server.
Update Rosette Server for REX Training Server
-
If you are using a standalone (non-Docker) version of Rosette Server, copy the file ./scripts/update-rs-for-rts.sh
to the Rosette Server machine or directory.
-
Run the script from the Rosette Server directory.
./basis/rts/update-rs-for-rts.sh
The script modifies the Rosette Server installation to install custom profiles and update environment variables.
-
Custom profiles are enabled if they are not already enabled. You will be prompted for where the custom profiles should be stored. The default location is /basis/rosette/server/launcher/config/custom-profiles
.
-
If custom profiles are already enabled, the ad-suggestions
and ad-base-linguistics
profiles are copied out to the custom profile subdirectory.
-
If the ad-suggestions
and ad-base-linguistics
profiles are already installed, they are overwritten.
-
The wrapper.conf
file of Rosette Server is updated to include the following environment variables. If the file already has the variables defined, they are overwritten.
set.RTS_URL=http://localhost:9080
set.RTSSOURCE=statistical
-
Each time the update script is run, a log file with a time stamp is created. Example: update-rs-for-rts.sh.01-04-22_13-22
.
-
All modified files are backed up to the directory where they were changed, with a timestamp.
The script will prompt you for the following information:
Table 6. Rosette Server RTS Update Prompts
Prompt
|
Purpose
|
Options
|
Notes
|
Update Rosette Server for REX Training Server?
|
RTS requires special configuration files.
|
Y to continue
N to cancel
|
|
Fully qualified host name where REX Training Server is installed
|
|
The suggested value will be the host name of your current machine
|
|
Port REX Training Server is listening on
|
|
Default: 9080
|
|
|
|
|
|
Location of Rosette Server installation
|
|
Default:
/basis/rosette
|
|
Directory to store custom profiles
|
Custom profiles can be in any directory
|
Default:
/basis/rosette/server/launcher/config/custom-profiles
|
|
If the custom profiles are not installed correctly, you will receive a RosetteException
from Rosette Adaptation Studio. Example:
ras_server_1 | raise RosetteException(code, message, url)
ras_server_1 | rosette.api.RosetteException: unauthorizedProfile: Access to profile
'ad-suggestions' not granted:
ras_server_1 | http://ras_proxy:42/rs/rest/v1/entities
Once you have run the update script for Rosette Server, verify the install.
-
Start Rosette Server, if it's not already running.
-
Verify the custom profiles were deployed through the custom-profiles
endpoint:
curl http://<rosette-host>:<port>/rest/v1/custom-profiles
or, in a browser open:
http://<rosette-host>:<port>/rest/v1/custom-profiles
At a minimum, the following two profiles should be returned by the endpoint:
[ad-base-linguistics,ad-suggestions]
If your installation has other custom profiles installed, they will also be returned.
-
Verify the REX Training Server configuration.
-
Start REX Training Server.
-
Call the entities
endpoint using the profileId ad-suggestions
and an existing RTS workspace.
curl --location --request POST 'http://<rosette-host>:<port>/rest/v1/entities'/
--header 'Content-Type: application/json' --data-raw /
'{ "content": "The Securities and Exchange Commission today announced the leadership of the /
agency'\''s trial unit.",/
"profileId":"ad-suggestions", "options": {"rtsDecoder": "6224dd36897e684a81935558"}}'
If the value for rtsDecoder
is a valid RTS workspace, a HTTP 200 should be returned.
If this is a new install, and there are no RTS workspaces with the provided string, a HTTP 404 response should be returned. Any other value indicates a misconfiguration.
Update Rosette Server for Events Training Server
-
Copy the file /basis/ets/scripts/update-rs-for-ets.sh
to the Rosette Server machine or directory.
-
Run the script from the Rosette Server directory.
./update-rs-configuration.sh
Update for legacy schemas
The update script updates Rosette Server to support legacy events schemas that used the TIME
and MONEY
entity types, instead of the current entity types of TEMPORAL:TIME
and IDENTIFIER:MONEY
. To apply these updates, copy the file EntityTypePatcher.jar
along with the update-rs-for-ets.sh
script to the Rosette Server machine or directory.
Note
If the legacy schema patch is to be applied, the machine running the patch must have Java installed (minimum Java 11).
The update script will back up all changed files to the directory <current working directory>/regex-backup-<timestamp>
. To roll back the changes, copy the files back to the Rosette Entity Extractor root directory.
The script will prompt you for the following information:
Table 7. Rosette Server Events Update Prompts
Prompt
|
Purpose
|
Options
|
Notes
|
Should Rosette Server be updated to communicate with Events Training Server?
|
Rosette Server only communicates with ETS in production.
|
N for the training server
Y for the production server
|
|
Fully qualified host name where Events Training Server is installed
|
|
The suggested value will be the host name of your current machine
|
Can only be localhost or 127.0.0.1 if Rosette Server is standalone (not running in a Docker container).
|
Port Events Training Server is listening on
|
|
Default: 9999
|
|
Location of Rosette Server configuration
|
This directory will be mounted as a volume.
|
Default:
/basis/rs/config
|
The configuration file to customize Rosette Server.
|
Location of Rosette Server roots
|
This directory will be mounted as a volume.
|
Default:
/basis/rs/roots
|
|
Rosette Server memory management
There is not a single one size fits all number here. The best value for max heap size depends on a number of factors:
-
activated endpoints and features
-
usage pattern
-
data characteristics such as size (both character and token lengths), language, and genre
-
java garbage collector and its settings
Please note that it’s not recommended setting the max heap to the amount of physical RAM in the system. More heap doesn’t always translate to better performance, especially depending on your garbage collection settings.
Rosette Server’s data files are loaded into virtual memory. Some endpoints, such as /entities
, involve a large amount of data. In order for Rosette to operate at its peak performance, we recommend that you reserve enough free memory to allow memory mapping of all our data files so that page misses are minimized at runtime.
To modify the JVM heap for standalone, edit the file server/conf/wrapper.conf
and modify the value of wrapper.java.maxmemory
.
# Maximum JVM heap in GB
ROSETTE_JVM_MAX_HEAP=32
# Minimum JVM heap in GB
ROSETTE_JVM_MIN_HEAP=32
We also recommend increasing the worker threads to 4, as described in Configuring worker threads for HTTP transport.
Install REX Training Server (RTS)
You must have Docker, dockercompose, and unzip installed.
The product can be installed interactively or with a headless installer.
To install interactively:
-
Unzip the file rts-installation-<version>.zip
.
-
From the directory rts-installation-<version>
, run the installation script:
./install-rts.sh
To run the headless version of the script:
./install-ras-headless.sh
The properties file is in the same directory as the script.
Use the --dry-run
flag to validate the properties file, print the settings, and exit without changing anything.
The REX Training Server installer will prompt you for the following information.
Table 8. REX Training Server Installer Prompts
Prompt
|
Purpose
|
Options
|
Notes
|
Installation directory
|
Installation directory for REX Training Server files
|
Default:
/<installDir>/rts
|
This is now the <RTSinstallDir>
|
Installation directory for docker files
|
Directory where REX Training Server docker compose files will be installed.
|
Default:
/<RTSinstallDir>/rts-docker
|
The disk requirements for the docker compose files are minimal (< 1 MB). However, other parts of the install require greater disk space
|
Should REX Training Server docker image be loaded?
|
Load the Docker images so they are available on the local machine
|
Otherwise, load them to a Docker registry shared by all machines.
|
|
Port REX Training Server should listen on
|
|
Default: 9080
|
This port and hostname will be required when installing the other servers.
|
Directory to store the REX Training Server assets and wordclass files
|
This directory will be mounted as a volume.
|
Default: /<RTSinstallDir>/assets
|
This directory holds files needed for training including static wordclass files.
The wordclass files can be manually installed later but must exist prior to starting RTS.
|
Directory to hold configuration information for the data access layer (DAL) connector and if the configuration file should be copied in now. If the file is copied at install time then you will be prompted for the host of the Rosette Adaptation Studio instance.
|
This directory will be mounted as a volume.
|
Default:
/<RTSinstallDir/config
|
The DAL connects to the mongo database on the Rosette Adaptation Studio component to access samples. If the port 27017 is NOT exposed on the RAS server then the mongodal_config.yaml file must be manually updated with the correct port number before starting the REX training server.
|
Fully qualified host name where Rosette Adaptation Studio is installed
|
|
The suggested value will be the host name of your current machine
|
Cannot be empty, localhost or 127.0.0.1
|
Location of REX Training Server Logs
|
|
Default: <RTSinstallDir>/logs
|
|
REX Training Server workspaces root directory
|
This directory will be mounted as a volume.
|
Default: <RTSinstallDir>/workspaces
|
|
REX Training Server memory management
The number of models that can be simultaneously trained depends on the size of the models and the memory available.
Once the model is written to disk, it consumes relatively little memory (~2 GB) for runtime requirements. The training and writing phases are much more memory intensive, each consuming approximately three times more memory. Typically, a model actively training will require approximately 10 GB of RAM.
Total memory consumption depends on the number of models being trained simultaneously, as well as the size of the models. The training server is a Java application and all operations use the JVM heap. To allow more simultaneous annotations on more projects, increase the RAM allocated to the JVM in REX Training Server.
To modify the JVM heap:
-
Create a file jvm.options
in the /basis/rts/config
directory. In this file, set the initial and maximum heap sizes. They should be set to the same value. The values must be less than the physical RAM installed on the machine.
# Set the initial and minimum heap size to 16GB
-Xms16G
# Set the maximum heap size to 16GB
-Xmx16G
-
Edit the file /basis/rts/rts-docker/docker-compose.yml
and uncomment the line ${JVM_OPTIONS}:/config/jvm.options
.
# Optionally override JVM settings here, default -Xms8G -Xmx16G
- ${JVM_OPTIONS}:/config/jvm.options
-
Edit the file /basis/rts/rts-docker/.env
and set JVM_OPTIONS
to point to the jvm.options
file.
JVM_OPTIONS=/basis/rts/config/jvm.options
Install Event Training Server (ETS)
The Events Training Server must be installed on both the training and the Rosette Server production instance (extraction). The same ETS file is installed, either in training or extraction mode.
You must have Docker, dockercompose, and unzip installed.
The product can be installed interactively or with a headless installer.
To install interactively:
-
Unzip the file ets-installation-<version>.zip
.
-
Start the installation:
./install-ets.sh
To run the headless install, use the --headless
flag. The .properties
file is in the same directories as the installation script.
Use the --dry-run
flag to validate the properties file, print the settings, and exit without changing anything.
The Event Training Server installer will prompt you for the following information:
Table 9. Event Training Server Installer Prompts
Prompt
|
Purpose
|
Options
|
Notes
|
ETS mode
|
Determine if installation is for training or extraction (production) mode
|
1) Training
2) Extraction
3) Exit Installer
|
Sets the mode. Training mode prompts for location of Rosette Server; extraction mode does not.
|
Installation directory
|
Installation directory for Events Training Server files
|
Default:
/<installDir>/ets
|
This is now the <ETSinstallDir>
|
Port Event Training Server should listen on
|
|
Default: 9999
|
This port and hostname will be required when installing the other servers.
|
Directory for ETS workspaces
|
This directory will be mounted as a volume.
|
Default: /<ETSinstallDir>/workspaces
|
This directory holds the events models.
|
Fully qualified host name where Rosette Server is installed
|
Not asked when installing in extraction mode (production server)
|
The suggested value will be the host name of your current machine
|
Cannot be empty, localhost or 127.0.0.1
|
Port Rosette Server is listening on
|
Not asked when installing in extraction mode (production server)
|
Default: 8181
|
|
Table 10. .env File Parameters
Parameter
|
Note
|
Default
|
RS_URL
|
Only needed in training mode. Users are prompted during install for the value if performing a Training mode installation.
|
None
|
ETS_MODE
|
ETS is either in training or extraction mode
|
Training
|
ETS_PORT
|
The port ETS will listen on. Users are prompted during install for the value
|
9999
|
ETS_IMAGE
|
The container image of the ETS front end.
|
|
ETS_CONFIG_FILE
|
The location of the application.yml configuration file.
|
{InstallDir}/config/application.yml
|
ENABLE_OUTGOING_SSL
|
true if ETS should use SSL when connecting to Rosette Server (and P-ETS workers if they are on remote hosts). False otherwise. Note, ETS_KEYSTORE_PW , ETS_KEYSTORE_FILE , ETS_TRUSTSTORE_PW and ETS_TRUSTSTORE_FILE must be specified if ENABLE_OUTGOING_SSL=true
|
false
|
ETS_KEYSTORE_PW
|
The password of the JKS keystore file.
|
None
|
ETS_KEYSTORE_FILE
|
The location of the JKS keystore.
|
None
|
ETS_TRUSTSTORE_PW
|
The password of the JKS trust store file.
|
None
|
ETS_TRUSTSTORE_FILE
|
The location of the JKS truststore.
|
None
|
ETS_LOGGING_LEVEL
|
Controls the granularity (verbosity) of the logging. Options include, ERROR, WARN, INFO, DEBUG, or TRACE.
|
INFO
|
PETS_IMAGE
|
The container image of the P-ETS worker
|
Release dependent
|
PETS_WORKSPACES
|
The location to store the ETS models.
|
{InstallDir}/workspaces
|
NGINX_IMAGE
|
The container image of the nginx proxy in use.
|
nginx:1.20.2-alpine
|
NGINX_CERT_PEM_FILE
|
The host certificate in PEM file format. Used to enable incoming SSL connections.
|
None
|
NGINX_KEY_PEM_FILE
|
The host key in PEM file format. Used to enable incoming SSL connections.
|
None
|
NGINX_TRUSTED_PEM_FILE
|
The CA certificate in PEM file format. Used to enable incoming SSL connections.
|
None
|
NGINX_CONF_FILE
|
The location of the nginx configuration file. Either nginx-ssl.conf or nginx-not-ssl.conf depending on if SSL is enabled.
|
nginx-not-ssl.conf
|
ETS application.yml configuration file
The application.yml
file controls the configuration of the ETS application. The values in this file rarely change and are relative to the container, meaning the values are only used within the container and have no relevance to the machine running the container.
Server
This section contains the basic server setup. context-path
is the part of the URL prepended to all ETS URLs for example /ets/info
. In the container, ETS is listening on port 9999.
server:
servlet:
context-path: /ets
port: 9999
Logging
This section contains the log setup. The default log level is info and can be changed by setting the ETS_LOGGING_LEVEL
value in the .env file. The com.netflix.eureka.cluster
is set to ERROR because by default it fills the log with unneeded log messages. The same is true for com.netflix.eureka.registry
. If you would like to log everything, the two lines referencing com.netflix.*
can be commented out with a #.
logging:
level:
ROOT: ${ETS_LOGGING_LEVEL:info}
com.netflix.eureka.cluster: ERROR
com.netflix.eureka.registry: WARN
Management
This section controls the management services, including health and metrics. This service can be run on a different port so the management services are not on the same interface as the ETS API. Note: enabling this management port will require changes to the docker-compose.yml
file to expose the port.
The health endpoint was customized to disable the display of disk space reporting, ping reporting and refresh information as it cluttered the response. In addition, the health endpoint is configured to always show details of the dependent services (P-ETS and in training mode, RS). To change the behavior and get a simple UP/DOWN response set show-details to never
.
The following management endpoints are enabled: info
, health
, metrics
, and prometheus
. There are approximately 20 additional management endpoints that can be enabled.
Metrics is enabled to expose runtime metric information about the ETS process, memory consumption, threads and CPU usage.
Prometheus is enabled so that ETS can be used as a data source for monitoring applications such as Graphana.
Endpoint timing information is enabled and available using the /ets/prometheus
endpoint
management:
# Management can be on a separate port
# server:
# port: 9888
health:
diskspace:
enabled: false
ping:
enabled: false
refresh:
enabled: false
endpoint:
health:
show-details: always
endpoints:
web:
base-path: /
exposure:
include: "prometheus, metrics, health, info"
metrics:
web:
server:
auto-time-requests: true
Eureka
ETS_HOST is only used when ETS is running remotely to PETS
eureka:
dashboard:
path: /eurekadashboard
enabled: false
instance:
appname: JETS
hostname: ${ETS_HOST:ets-server}
leaseRenewalIntervalInSeconds: 30
leaseExpirationDurationInSeconds: 120
status-page-url: /ets/info
health-check-url: /ets/health
server:
enableSelfPreservation: false
client:
healthcheck:
enabled: false
# As the server we don't want to register with ourselves
registerWithEureka: false
fetchRegistry: false
serviceUrl:
defaultZone:
http://${eureka.instance.hostname}:${server.port}/ets/eureka/
Info
This section determines the ETS operating mode (training or extraction). The ETS_TRUSTSTORE_FILENAME and ETS_KEYSTORE_FILENAME are only defined when running outside a container.
info:
app:
name: "events-training-server"
description: "Rosette Event Extraction and Training Server"
version: "1.0.0"
build: "071221145647"
ets:
operating-mode: "${ETS_MODE:training}"
rs:
# rsUrl is only required in training configuration
rsUrl: ${RS_URL:}
rsConnectTimeoutMS:30000
ssl:
enable-outgoing-ssl: ${ENABLE_OUTGOING_SSL:false}
key-store: ${ETS_KEYSTORE_FILENAME:/certs/keystore.jks}
key-store-password: ${ETS_KEYSTORE_PW:}
trust-store: ${ETS_TRUSTSTORE_FILENAME:/certs/truststore.jks}
trust-store-password: ${ETS_TRUSTSTORE_PW:}
pets:
minimumVersion: v1.0.0
connectTimeoutMS: 60000
readTimeoutMS: 60000
writeBufferSizeKB: 1000
Springdoc
springdoc:
show-actuator: true
# Enable/disable swagger documentation
api-docs:
enabled: true
spring:
banner:
location: classpath:ets-banner.txt
resources:
add-mappings: false
cloud:
discovery:
client:
composite-indicator:
enabled: false
health-indicator:
enabled: false
This process describes how to capture the logs for the ETS frontend process, (the J-ETS server). The backend worker (P-ETS) processes logs through the docker subsystem.
Configuring the Log Files
-
On the host machine, create a directory for the logs and set the permissions.
mkdir /basis/ets/logs
chmod 777 /basis/ets/logs
The container must have sufficient permissions to write to the directory (uid = 1000, user = ets, group = ets).
-
Edit the file /basis/ets/ets-docker/.env
, adding a variable to set the logs directory.
Add:
ETS_LOG_DIR=/basis/ets/logs
-
Edit the file /basis/ets/ets-docker/docker-compose.yml
to mount the logs directory.
In the ets-server:
section, add a new volume definition, using the new logs directory. The last line in the sample below is the added line.
volumes:
- ${ETS_CONFIG_FILE}:/application.yml
- ${ETS_KEYSTORE_FILE}:/certs/keystore.jks
- ${ETS_TRUSTSTORE_FILE}:/certs/truststore.jks
- ${ETS_LOG_DIR}:/logs
This will create the /logs
directory in the container.
-
Tell ETS to use the ETS_LOG_DIR
by editing the file /basis/ets/config/application.yml
and adding the file: path: /log
s statements.
file:
path: /logs
level:
ROOT: ${ETS_LOGGING_LEVEL:info}
com.netflix.eureka.cluster: ERROR
com.netflix.eureka.registry: WARN
Note that the values in the application.yml
file refer to values in the container, not the host. The path specified in logging.file.path
should be /logs
or whatever the volume was set to in the docker-compose.yml
file.
Log File Naming
The default log file name is spring.log
. If you prefer a different name, you can change the log file name.
-
Edit the file /basis/ets/config/application.yml
and set the log file name by adding the name parameter and removing the path parameter from the logging section. If path and name are both present, path takes precedence and the default log file name will be used.
logging:
file:
name: /logs/ets-server.log
level:
ROOT: ${ETS_LOGGING_LEVEL:info}
com.netflix.eureka.cluster: ERROR
com.netflix.eureka.registry: WARN
Log Rotation
By default, once logs reach 10 MB they are archived. That is, they are compressed with a date stamp and sequence number such as ets-server.log.2022-03-04.0.gz
.The file size at which this occurs can be changed by setting the max-size
in the file /basis/ets/config/application.yml
.
logging:
file:
name: /logs/ets-server.log
max-size: 20 MB
Supported values for the max-size include MB and KB.
Install Rosette Adaptation Studio (RAS)
You must have Docker, dockercompose, and unzip installed.
-
Unzip the file rosette-adaptation-studio-<version>.zip
.
-
From the directory rosette-installation-<version>
, run the installation script:
Start the installation:
./install-ras.sh
To run the headless version of the script:
./install-ras-headless.sh
The properties file is in the same directory as the script.
Use the --dry-run
flag to validate the properties file, print the settings, and exit without changing anything.
Note
SSL for the front end browser, the connection from the web client to the Rosette Adaptation Studio server, can be enabled when RAS is installed. After installation of all three servers is complete, you can enable SSL between the servers.
To enable SSL for the front end browser, answer Yes to the question "Enable SSL for NGINX?". The certificate should already be on the server before beginning the installation.
Enabling front end SSL support is independent of enabling SSL between the servers.
The RAS installer will prompt you for the following information:
Table 11. RAS Installer Prompts
Prompt
|
Purpose
|
Options
|
Notes
|
Installation directory
|
Directory for docker compose files and helper scripts.
|
Default:
/basis/ras
|
The disk requirements for the docker compose files are minimal (< 1 MB). However, other parts of the install require greater disk space
|
Enter location of Adaptation Studio logs
|
Directory for log files
|
Default: /basis/ras/logs
|
|
Load the Rosette Adaptation Studio Docker image?
|
Load the Docker images so they are available on the local machine
|
Otherwise, load them to a Docker registry shared by all machines.
|
|
Enable SSL for NGINX?
|
To enable SSL for the connection from the web client to the RAS server
|
|
In a closed network this may not be required however, passwords from the client to server are initially sent using clear-text so it is recommended to enable SSL.
|
Target directory for SSL certificate files
|
Directory that will contain the SSL certificate files
|
Default: /basis/ras/certs
|
For information on SSL certificate files:http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_certificate
|
Location of the certificate key file
|
Where to find the certificate key file
|
|
The certificate must be in PEM format
|
Location of the certificate file
|
Where to find the certificate file
|
|
The certificate must be in PEM format
|
HTTPS Port to expose
|
Required if enabling SSl
|
Default: 443
|
|
HTTP port to expose
|
Required if not enabling SSL
|
Default: 80
|
|
Fully qualified host name where REX Training Server is installed
|
Used by Rosette Adaptation Studio to perform training for entity extraction
|
|
The REX Training Server does not need to be installed before Rosette Adaptation Studio, but you must know where it will be installed.
Cannot be empty, localhost or 127.0.0.1
|
Port REX Training Server is listening on
|
|
Default: 9080
|
|
Fully qualified host name where Events Training Server is installed
|
Used by Rosette Adaptation Studio to perform training for events extraction
|
|
The Event Server does not need to be installed before Rosette Adaptation Studio, but you must know where it will be installed.
Cannot be empty, localhost or 127.0.0.1
|
Port Event Training Server is listed on
|
|
Default: 9999
|
|
Fully qualified host name where Rosette Server is installed
|
Used internally by Rosette Adaptation Studio
|
|
Rosette Server does not need to be installed before Rosette Adaptation Studio but Server should be installed and started before starting the studio. Liveliness checks will be performed on startup.
Cannot be empty, localhost or 127.0.0.1
|
Port Rosette Server is listening on
|
|
Default: 8181
|
|
Data directory for Rosette Adaptation Studio database
|
Directory where the Adaptation Studio data will be stored.
|
Default: /basis/ras/mongo_data_db
|
This can be an NFS mount.
|
Directory for database backups
|
Directory where data should be stored when backed up from the RAS client.
|
Default: /basis/ras/mongo_data_dump
|
This can be an NFS mount.
|
Port to expose for the database .
|
This port will be used by RTS to connect to the RAS database instance to retrieve samples.
|
Default: 27017
|
|
Rosette Adaptation Studio has scripts on each server to monitor the health and status of the system. Run them at startup to verify the system is ready to go.
The scripts are:
-
/basis/rs/scripts/rs-healthcheck.sh
-
/basis/rts/scripts/rts-healthcheck.sh
-
/basis/ras/scripts/ras-healthcheck.sh
where /basis
is the default installation directory. If you've installed in a different directory, replace /basis
with your install directory.
Each script verifies that the Docker containers have loaded and all components are running.
To check the status of ETS, open a browser and proceed to http:/{host}:{port}/ets/health
. The default port is 9999.
The workers indicate that ETS is available and Rosette can communicate with it:
{"status": "UP",
"components": {
"PETS-Workers": {
"status": "UP",
"details": {
"P-ETS Workers": "1 Worker(s) Available"
}
},
"RosetteServer": {
"status": "UP",
"details": {
"Rosette Server": "Available at http://memento:8181/rest/v1"
}
}
}
}
Use the following links to verify the status of each of the servers:
Model Training Suite is shipped with a sample events model that can be used to verify the installation for events. Use the import_project.sh
script to import the project.