This guide provides a methodology and processes for gathering data and training custom entity and event extraction models for use with Rosette Server.
This document is one component of the complete Rosette Model Training Suite documentation set. The full set includes:
-
Developing Models
A guide for the system architects and model administrators to aid in defining the modeling strategy and understanding the theory of model training. It includes an explanation of event modeling and how to design an event schema in preparation for training event extraction models, as well as guidelines for gathering and preparing data for model training.
-
System Administrator Guide
A guide for installing and maintaining both the training and production environments of the Rosette Model Training Suite. Included are instructions for moving trained models from the training environment into the production environment, as well as the documentation for the API calls for entity and event extraction.
-
Adaptation Studio User Guide
A guide for the managers, adjudicators, and annotators using Rosette Adaptation Studio describing how to use the tool to create and maintain projects, annotate and train entity and event extraction models, and create event schemas.
Rosette Model Training Suite
Rosette Adaptation Studio is an interactive tool for annotating data. It is part of the Rosette Model Training Suite for training models for Rosette entity and event extraction. The tool uses active learning to guide the annotation process, providing suggestions and choosing samples that will ensure the model converges as rapidly as possible towards the highest quality results. As data is annotated, the model is trained. The trained models are uploaded into your production instance of Rosette Server to perform custom entity and event extraction.
Features of Rosette Model Training Suite include:
Reduced training data requirements
Optimized annotator and project manager experiences
Modular templates supporting different types of projects
Integration with the Rosette linguistic framework
A robust data store capable of managing multiple simultaneous multi-user annotation efforts
Display and search features providing both high-level and deep-dive views of each project’s progress
Accuracy metrics
Automatic model training
Trained custom models for deployment in production installations of Rosette Server
Templates currently available:
The following languages are supported by Model Training Suite for model training and extraction.
Table 1. Supported Languages by Task
Language |
Model Type |
|
NER |
Events |
Arabic (ara ) |
✓ |
✓ |
Chinese (zho ) |
✓ |
✓ |
Dutch (nld ) |
✓ |
|
English (eng ) |
✓ |
✓ |
French (fra ) |
✓ |
|
German (deu ) |
✓ |
✓ |
Hebrew (heb ) |
✓ |
|
Hungarian (hun ) |
✓ |
✓ |
Indonesian (ind ) |
✓ |
|
Italian (ita ) |
✓ |
|
Japanese (jpn ) |
✓ |
✓ |
Korean (kor ) |
✓ |
✓ |
Malay, Standard (zsm ) |
✓ |
|
Persian (fas ) |
✓ |
|
Portuguese (por ) |
✓ |
|
Pashto (pus ) |
✓ |
|
Russian (rus ) |
✓ |
✓ |
Spanish (spa ) |
✓ |
|
Swedish (swe ) |
✓ |
|
Tagalog (tgl ) |
✓ |
|
Urdu (urd ) |
✓ |
|
Vietnamese (vie ) |
✓ |
|
Model Training Architecture
A complete Rosette Adaptation Studio system installation includes the following major components. All installations must include Rosette Adaptation Studio and Rosette Server. An installation may include one or both of the training servers: REX Training Server and Events Training Server.
Rosette Adaptation Studio: Provides annotation and project management features, as well as user and role management and the project database.
-
Rosette Server: Rosette Server is an on-premise package that provides access to the Rosette text analytics endpoints. Your license determines which endpoints and languages are active in your installation. The entities endpoint is part of the Rosette Entity Extractor (REX), which is deployed through Rosette Server.
The suggestions provided for annotation labels are generated by the entities and morphology endpoints.
REX Training Server: Trains entity extraction models and stores the models while training.
Event Training Server: Trains event extraction models and stores event models for training and event extraction in production.