RBL canonicalizes some language codes to other language codes. These canonicalization rules apply to all APIs.
LanguageCode.NORWEGIAN
is canonicalized to LanguageCode.NORWEGIAN_BOKMAL
immediately upon any input to the API. This means that there can be no distinguishing between them. In particular, an Analyzer
built from a factory configured to use Norwegian will report its language as Norwegian Bokmål instead.
Similarly, LanguageCode.SIMPLIFIED_CHINESE
and LanguageCode.TRADITIONAL_CHINESE
are canonicalized to LanguageCode.CHINESE
immediately. The one exception is that they are not canonicalized as inputs to or outputs from the Chinese Script Converter.
Those are the only language code canonicalizations. Although RBL internally treats Afghan Persian and Iranian Persian as Persian, they are not considered the same language. This makes it possible to configure different user dictionaries for each variety of Persian, even though they are otherwise processed identically.
RBL uses ISO 639-3 language codes to specify languages as strings. There are a few nonstandard language codes, as indicated. RBL also accepts 2-letter codes specified by the ISO-639-1 standard. See the Javadoc for more details on language codes.