See also: How does tokenization of European languages work (RBL-JE)?
In RLP (native), the European Language Analyzer (ELA) processor tokenizes text in a language-specific manner, designed to facilitate linguistic analysis. For instance, English "in front of" is parsed as a single token, identified as a preposition. French "sur-le-champs" is parsed into two tokens, ["sur-", "-le-champs"], each containing an intermediate "-".
Some applications, such as named entity extraction and search, may work better with language-independent tokenization, where "in front of" is parsed as three tokens, and "sur-le-champs" is parsed as five: ["sur", "-", "le", "-", "champs"]. If you would prefer this method, reorder the processors in your context file so that Word Breaker comes before European Language Analyzer.
You can read more about this in the Rosette Linguistics Platform Application Developer’s Guide, the section entitled "European Language Tokenizers".