Rosette Base Linguistics Jason Stevens April 28, 2023 19:28 Follow Language (code) Tokenization Parts of Speech Lemmas Compound Components Han Readings Sentence Boundary Arabic (ara) ✓ ✓ ✓ ✓ Catalan (cat) ✓ ✓ ✓ Chinese (zho) ✓ ✓ ✓ ✓ ✓ Czech (ces) ✓ ✓ ✓ ✓ Danish (dan) ✓ ✓ ✓ ✓ Dutch (nld) ✓ ✓ ✓ ✓ ✓ English (eng) ✓ ✓ ✓ ✓ Estonian (est) ✓ ✓ ✓ Finnish (fin) ✓ French (fra) ✓ ✓ ✓ ✓ German (deu) ✓ ✓ ✓ ✓ ✓ Greek (ell) ✓ ✓ ✓ ✓ Hebrew (heb) ✓ ✓ ✓ ✓ Hungarian (hun) ✓ ✓ ✓ ✓ ✓ Indonesian (ind) ✓ ✓ ✓ Italian (ita) ✓ ✓ ✓ ✓ Japanese (jpn) ✓ ✓ ✓ ✓ ✓ Korean (kor) ✓ ✓ ✓ ✓ Korean-North (qkp) ✓ ✓ ✓ ✓ Korean-South (qkr) ✓ ✓ ✓ ✓ Latvian (lav) ✓ ✓ ✓ Malay, Standard (zsm) ✓ ✓ ✓ Norwegian (nor) ✓ ✓ ✓ ✓ Norwegian-Bokmål (nob) ✓ ✓ ✓ ✓ Norwegian-Nynorsk (nno) ✓ ✓ ✓ ✓ Pashto (pus) ✓ Persian (fas) ✓ ✓ ✓ ✓ Persian-Afghan (prs) ✓ ✓ ✓ ✓ Persian-Iranian (pes) ✓ ✓ ✓ ✓ Polish (pol) ✓ ✓ ✓ ✓ Portuguese (por) ✓ ✓ ✓ ✓ Romanian (ron) ✓ ✓ ✓ Russian (rus) ✓ ✓ ✓ ✓ Serbian (srp)[a] ✓ ✓ ✓ Slovak (slk) ✓ ✓ ✓ Spanish (spa) ✓ ✓ ✓ ✓ Swedish (swe) ✓ ✓ ✓ ✓ Tagalog (tgl) ✓ ✓ ✓ Thai (tha) ✓ ✓ ✓ Turkish (tur) ✓ ✓ ✓ Ukrainian (ukr) ✓ Urdu (urd) ✓ ✓ ✓ [a] Rosette’s /morphology endpoint only supports Serbian text written in Latin script. However, by default, Rosette only identifies Serbian text written in Cyrillic script. To take advantage of the morphological analysis feature for Serbian, you must explicitly include the language code srp in your request. Related articles ISO 639-3 Language Codes RLI Languages REX- PDF Download Understanding Name Match Scores Comments 0 comments Article is closed for comments.