This guide provides instructions for installing and maintaining the training and production environments for Rosette Model Training Suite.
The training section contains installation instructions for the complete Rosette Model Training Suite. Included components are Rosette Server, Adaptation Studio, REX Training Server, and Event Training Server. Your installation may include one or both training servers.
The production section contains installation instructions for a production environment, as well as how to perform event and entity extraction. Included are instructions for moving trained models from the training environment into the production environment.
This document is one component of the complete Rosette Model Training Suite documentation set. The full set includes:
-
System Administrator Guide
A guide for installing and maintaining both the training and production environments of the Rosette Model Training Suite. Included are instructions for moving trained models from the training environment into the production environment, as well as the documentation for the API calls for entity and event extraction.
-
Developing Models
A guide for the system architects and model administrators to aid in defining the modeling strategy and understanding the theory of model training. It includes an explanation of event modeling and how to design an event schema in preparation for training event extraction models, as well as guidelines for gathering and preparing data for model training.
-
Adaptation Studio User Guide
A guide for the managers, adjudicators, and annotators using Rosette Adaptation Studio describing how to use the tool to create and maintain projects, annotate and train entity and event extraction models, and create event schemas.
Rosette Model Training Suite
Rosette Adaptation Studio is an interactive tool for annotating data. It is part of the Rosette Model Training Suite for training models for Rosette entity and event extraction. The tool uses active learning to guide the annotation process, providing suggestions and choosing samples that will ensure the model converges as rapidly as possible towards the highest quality results. As data is annotated, the model is trained. The trained models are uploaded into your production instance of Rosette Server to perform custom entity and event extraction.
Features of Rosette Model Training Suite include:
Reduced training data requirements
Optimized annotator and project manager experiences
Modular templates supporting different types of projects
Integration with the Rosette linguistic framework
A robust data store capable of managing multiple simultaneous multi-user annotation efforts
Display and search features providing both high-level and deep-dive views of each project’s progress
Accuracy metrics
Automatic model training
Trained custom models for deployment in production installations of Rosette Server
Templates currently available:
The following languages are supported by Model Training Suite for model training and extraction.
Table 1. Supported Languages by Task
Language |
Model Type |
|
NER |
Events |
Arabic (ara ) |
✓ |
✓ |
Chinese (zho ) |
✓ |
✓ |
Dutch (nld ) |
✓ |
|
English (eng ) |
✓ |
✓ |
French (fra ) |
✓ |
|
German (deu ) |
✓ |
✓ |
Hebrew (heb ) |
✓ |
|
Hungarian (hun ) |
✓ |
✓ |
Indonesian (ind ) |
✓ |
|
Italian (ita ) |
✓ |
|
Japanese (jpn ) |
✓ |
✓ |
Korean (kor ) |
✓ |
✓ |
Malay, Standard (zsm ) |
✓ |
|
Persian (fas ) |
✓ |
|
Portuguese (por ) |
✓ |
|
Pashto (pus ) |
✓ |
|
Russian (rus ) |
✓ |
✓ |
Spanish (spa ) |
✓ |
|
Swedish (swe ) |
✓ |
|
Tagalog (tgl ) |
✓ |
|
Urdu (urd ) |
✓ |
|
Vietnamese (vie ) |
✓ |
|
Model Training Architecture
A complete Rosette Adaptation Studio system installation includes the following major components. All installations must include Rosette Adaptation Studio and Rosette Server. An installation may include one or both of the training servers: REX Training Server and Events Training Server.
Rosette Adaptation Studio: Provides annotation and project management features, as well as user and role management and the project database.
-
Rosette Server: Rosette Server is an on-premise package that provides access to the Rosette text analytics endpoints. Your license determines which endpoints and languages are active in your installation. The entities endpoint is part of the Rosette Entity Extractor (REX), which is deployed through Rosette Server.
The suggestions provided for annotation labels are generated by the entities and morphology endpoints.
REX Training Server: Trains entity extraction models and stores the models while training.
Event Training Server: Trains event extraction models and stores event models for training and event extraction in production.
Required Rosette Endpoints
Rosette Model Training Suite uses features of Rosette Server to prepare input text and identify candidates for annotations. The following endpoints must be installed and licensed in your installation of Rosette Server for training and extraction.
Table 2. Rosette Server Required Endpoints
Endpoint |
NER Training |
Event Training |
Event Extraction |
/entities |
✓ |
✓ |
✓ |
/events |
|
|
✓ |
/language |
✓ |
✓ |
✓ |
/morphology |
|
✓ |
✓ |
/semantics |
|
✓ |
|
/sentences |
✓ |
✓ |
|
/tokens |
✓ |
✓ |
|
/info |
✓ |
✓ |
✓ |
/ping |
✓ |
✓ |
✓ |
The components of the Rosette Model Training Suite are used for two different purposes, training and production.
-
Training: Annotation and training of entity and event models. The training environment includes:
-
Production: Using previously-trained models to perform entity and event extraction. The production environment includes:
Rosette Server
Events Training Server
The training and production environments can use the same instance of Rosette Server or the two environments can be completely separate. You determine how many physical machines are required based on the size of your models and your organization's requirements. The following diagram shows two possible implementations.
When deploying the Rosette Model Training Suite, you must secure both the training environment and the deployed models. There are multiple ways to secure a model training and deployment environment.
Control who has access to the training system and prevent malicious actors from logging into the system.
Assign user access through the user management facility of Rosette Adaptation Studio. Control the level of access each user has to the system and limit users to the access they require to complete their work. For example, annotators can only annotate the documents assigned to them, limiting the impact they can have on the models.
Use the project management reports to reduce risks from insider threats as well as risks from non-malicious human errors. By ensuring that multiple users are training models in a similar and consistent way, an administrator can ensure that malicious actors are not corrupting the model creation process.
Control who has access to model files to prevent model files being altered by malicious parties after they are exported from MTS and deployed to Rosette Server.