What pairs of languages are particularly hard to tell apart?

Serbian and Croatian are extremely similar languages when they are both written in Latin script (along the lines of British and American English, but more politically fraught). We have only one dictionary for both, which handles them well (assuming the text in Latin script), but generally Serbian is written in Cyrillic script, which we don't have a dictionary for. Currently, we claim support for both, but clearly there are some caveats. For more information on the differences, see this article.

Standard Malay (zsm) and Indonesian (ind) is another pair that is hard to tell apart -- on short strings, even for humans. According to the Ethnologue, Indonesian and Standard Malay have over 80% lexical similarity. Both are derived from court-Malay, making them more closely related to one another than to any local Malay dialect (zlm).

RLI cannot distinguish the two individual languages making up the Persian macrolanguage: Western Farsi and Dari. RBL/REX have single Persian implementations, but RNI/RNT and Highlight distinguish between the two.

RLI also cannot distinguish between the two Norwegian individual languages, Nynorsk and Bokmål.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request


Please sign in to leave a comment.

Powered by Zendesk