What does the Rosette Entity Extraction endpoint do?

An entity refers to an object of potential interest, such as a person, organization, location, or date. When you process a document, locating entities can help you classify the document and determine what kinds of data of interest it is likely to contain.

The Rosette entity extractor endpoint uses statistical models, pattern matching (regular expressions), and exact matching (gazetteers) to identify entities in input text.

STATISTICAL PROCESSOR

Using contextual features specified by a computational linguist and a substantial body of news stories in which entities have been tagged by native speakers, Basis Technology has developed an AP (Averaged Perceptron) processor and statistical models for extracting a variety of entities in a number of languages.

PATTERN MATCHING PROCESSOR: REGULAR EXPRESSIONS

The entity extractor includes regular expressions for finding language-specific entities and generic entities that may appear in a variety of languages.

EXACT MATCHING PROCESSOR: GAZETTEERS

The entity extractor uses gazetteers to return exact matches. The distribution includes binary gazetteers for each language and a number of entity types, and a cross-language gazetteer for corporations. In order to match entities despite differences in whitespace, Gazetteer entries and potential matches are space normalized (any amount of whitespace between words is treated as a single space).


More info:

https://developer.rosette.com/features-and-functions#entity-extraction

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.




Powered by Zendesk