The Rosette Server endpoints are configured by the files found in the /launcher/config/rosapi
directory. Be careful when editing any of these files as the endpoints will not work if not configured properly.
The endpoint-specific parameters and settings are in the files of the format: cat-factory-config.yaml
and sent-factory-config.yaml
.
The worker-config.yaml
file configures the pipeline for each endpoint. The entries in this file are highly dependent on the backend code.
Adding new models for categorization and sentiment
The Rosette Classification Field Training Kit allows user to train their own classification models for the /categories
and /sentiment
endpoints. Reasons for training a new model include:
-
Supporting a language that Rosette does not currently support
-
Increasing accuracy on your particular input data
-
Supporting a specific categorization taxonomy for your data or task.
See the Training Classification Models with Rosette publication for more information.
Integrating Your Custom Model with Rosette Server
To deploy your custom-trained model, integrate it into Rosette Server as follows:
-
Ensure that, for the language your are targeting, the following directory exists: ${tcat-root}/models/<lang>/combined-iab-qag
-
Move any existing model files in the target directory to an unused
directory, e.g.
> mkdir ${tcat-root}/models/<lang>/unused
> mv ${tcat-root}/models/<lang>/combined-iab-qag/* ${tcat-root}/models/<lang>/unused
-
Copy all the model files from your newly trained model into your target directory, ${tcat-root}/models/<lang>/combined-iab-qag
-
Relaunch the Rosette Server server
After relaunching the Rosette Server, the categorization
endpoint will use the models in the combined-iab-qag
directory, therefore using your new model for the language of the newmodel.
For sentiment model integration, place your model files into ${sentiment-root}/data/svm/<lang>/
and use the sentiment
endpoint. Similarly, move all existing files for that model to a backup directory before copying over the new files.
Note
Note that depending on your specific FTK version, your newly created model may have a lexicon_filtered
file while the existing model has lexicon.filtered
instead. Rosette supports both naming schemes for backwards compatibility. Regardless of which naming scheme you see, you should remove the existing filtered lexicon file before adding the one from your new model. If both lexicon.filtered
and lexicon_filtered
files are in the same model directory, lexicon.filtered
will take precedence.
Adding new language models
Out of the box, the /sentiment
and /categories
endpoints only support the languages of the models that ship with the distribution. Once you have trained a model in a new language, you must add the new languages to the transport-rules.tsv
and worker-config.yaml
files in Rosette Enteprise.
-
For both endpoints, edit the transport-rules.tsv
file. Each endpoint is listed, with a lang=
statement listing the supported languages for the endpoint. Add the three letter ISO 693-3 language code for the new model languages.
/categories lang=eng
/sentiment lang=ara|eng|fas|fra|jpn|spa
-
For the /sentiment
endpoint only, edit the worker-config.yaml
file. Go to the section labeled textPipelines
. Each endpoint is listed with a languages:
statement listing the supported languages for the endpoint. Add the three letter ISO 693-3 language code for the new model languages.
# sentiment
- endpoint: /sentiment
languages: [ 'ara', 'eng', 'fas', 'fra', 'jpn', 'spa' ]
steps:
- componentName: entity-extraction
- componentName: sentiment
Configuring the sentiment endpoint for document-level analysis
The sentiment analysis endpoint can be configured to return document-level sentiment analysis only, by turning off entity-level sentiment analysis. This requires modifying the worker-config.yaml
file to remove the entity extraction step from the process. This will speed up document-level sentiment analysis.
The following edits are made to the worker-config.yaml
file. The shipped version of the file is:
# sentiment
- endpoint: /sentiment
languages: [ 'ara', 'eng', 'fas', 'fra', 'jpn', 'spa' ]
steps:
- componentName: entity-extraction
- componentName: sentiment
Change the above block to:
# sentiment
- endpoint: /sentiment
languages: [ 'ara', 'eng', 'fas', 'fra', 'jpn', 'spa' ]
steps:
- componentName: base-linguistics
factoryName: tokenize
- componentName: sentiment
The entity extraction endpoint must be replaced by the tokenization endpoint in the pipeline.