This document guides users of Rosette through the process of training custom classification models, for categorization or sentiment analysis, using the Rosette Classification Field Training Kit (FTK). The resulting models may be used with the categories or sentiment endpoints of Rosette.
Reasons for training a custom model include:
Supporting a language that Rosette does not currently support
Increasing accuracy on your particular input data
Supporting a specific categorization taxonomy for your data or task.
The Rosette Classification Field Training Kit (FTK) allows users to train their own classification models. If you have training data, you can build a machine-learned model. If you don’t have training data, you can build a keyword-based model by supplying a few keywords for each class. The keyword-based model requires significantly less labor, but will not work well for categories without clear Wikipedia-based concepts or very fine-grained categories like iPhone6 vs. iPhone7. At this time, the keyword-based model only supports English models.
While the FTK includes command line programs to run the classifiers for quick testing and evaluation purposes, no API is exposed. To integrate the models into a real application, users must configure the Rosette Enterprise runtime API. The FTK can be used to create custom categorization models to use with the
categories endpoint or to develop sentiment analysis models to use with the sentiment endpoint.
This document starts out by describing how to build a corpus to develop a customized document classifier for categories. We then describe best practices for testing the categorization models you've developed. Since sentiment is a very specific type of classification, we also include a section on developing custom sentiment analysis models. Feel free to jump to the sections describing the task you need to accomplish.