Rosette® Language Identifier (RLI) analyzes text from a few words to whole documents, to detect the languages and character encoding with speed and very high accuracy. Automatic language identification is the necessary first step for applications that categorize, search, process, and store text in many languages. Individual documents may be routed to language specialists, or sent into language-specific analysis pipelines (such as Rosette Base Linguistics) to improve the quality of search results.
For applications analyzing tweets, search keywords, and other short text, Rosette Language Identifier offers market-leading accuracy for detecting the language given 1-3 words (<20 bytes) up to a full sentence.
RLI achieves its incredible accuracy through the use of proprietary algorithms with information-rich language profiles derived from statistical analysis. As linguistics experts with deep understanding at the intersection of language and technology, Basis Technology continually improves the Rosette product family with language additions, feature updates, and the latest innovations from the academic world.