The categorization function identifies the category or categories associated with a document.
Scores Returned
Two scores are returned with each category:
score: A value between 1 to 1. All raw score values are independent of each other.
confidence: A value between 0 and 1. The raw scores are normalized such that the sum of all the confidence scores for a given document = 1.
Confidence score values for the Categorization function reflect the likelihood that a given category label is accurate relative to all other possible category labels. That means that for every document, all twenty one possible category labels are assigned a confidence score, and all these scores together sum to one. If multiple labels are relevant to a given document, confidence scores can be relatively low.
Rosette supports both single label and multilabel categorization. By default, the endpoint is set to multilabel and will return all relevant category labels where the raw score is above the scoreThreshold
. The default threshold is 0.25; you can override the threshold with by setting scoreThreshold
to any value. In general, a negative raw score for a category indicates the content probably doesn't fall under that category.
When in single label mode, Rosette returns the category with the highest confidence score. Both the raw score (score
) and confidence score (confidence
) are returned.
Options
Option 
Valid Values 
Default 





any real number 
0.25 
Example
"options": {"singleLabel": true, "scoreThreshold": .20}
Recommendation
To establish expectations and set an appropriate baseline threshold, run a series of tests with your data and see what kind of values the model returns. Don’t assume there is an absolute confidence level that indicates correct.
Comments
0 comments
Article is closed for comments.