Package com.basistech.rosette.languageidentifier

Detect language, encoding, and writing script.

See: Description

Package com.basistech.rosette.languageidentifier Description

Detect language, encoding, and writing script.

Use LanguageIdentifierBuilder to create LanguageIdentificationAnnotator, which implements the Annotator interface.

Here is an example of creating an annotator for language detection and using it to get the best language detection result:

final LanguageIdentificationAnnotator langIdentifier =
        new LanguageIdentifierBuilder(new File(rootDirectory))
                .buildSingleLanguageAnnotator();
final AnnotatedText results = annotator.annotate("This is the cereal shot from guns.");
final LanguageCode topLanguage = getWholeTextLanguageDetection()
        .getDetectionResults()
        .get(0).getLanguage();

To get language regions from input with blocks of text in different languages, Use the LanguageIdentifierBuilder to build an Annotator for detecting language regions.

Here is an example of creating such an annotator and using it to get the offsets and best language detection result for each language region:

final Annotator langRegionAnnotator =
        new LanguageIdentifierBuilder(new File(rootDirectory))
                .buildLanguageRegionAnnotator();
final AnnotatedText annotatedText =
        langRegionAnnotator.annotate(input_with_regions_of_text_in_different_languages);
final ListAttribute<LanguageDetection> langRegions =
        annotatedText.getLanguageDetectionRegions();
for (final LanguageDetection langRegion : langRegions) {
        final int regionStartOffset = langRegion.getStartOffset();
        final int regionEndOffset = langRegion.getEndOffset();
        final LanguageCode topLang = langRegion.getDetectionResults()
                .get(0).getLanguage();
}

Copyright © 2016 Basis Technology Corporation. All Rights Reserved.