The following options are described in more detail in Analyzers.
Table 5. General Analyzer Options
Option |
Description |
Type
(Default)
|
Supported Languages |
analysisCacheSize
cacheSize
|
Maximum number of entries in the analysis cache. Larger values increase throughput, but use extra memory. If zero, caching is off. |
Integer
(100.000)
|
All |
caseSensitive
|
Indicates whether analyzers produced by the factory are case sensitive. If false, they ignore case distinctions. |
Boolean
(true)
|
Czech, Danish, Dutch, English, French, German, Greek, Hebrew, Hungarian, Italian, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish |
deliverExtendedTags
|
Indicates whether the analyzers should return extended tags with the raw analysis. If true, the extended tags are returned. |
Boolean
(false)
|
All |
normalizationDictionaryPaths
|
A list of paths to user many-to-one normalization dictionaries, separated by semicolons or the OS-specific path separator. |
List of paths |
All |
query
|
Indicates the input will be queries, likely incomplete sentences. If true, analyzers may change their behavior (e.g. disable disambiguation) |
Boolean
(false)
|
All |
The following options are described in more detail in Compounds.
Table 6. Compound Options
Option |
Description |
Type
(Default)
|
Supported Languages |
decomposeCompounds
|
Indicates whether to decomose compounds.
For Chinese and Japanese, alternativeTokenization must be enabled.
If koreanDecompounding is enabled but decomposeCompounds is disabled, compounds will be decomposed.
|
Boolean
(true)
|
Chinese, Danish, Dutch, German, Hungarian, Japanese, Korean, Norwegian (Bokmål, Nynorsk),Swedish |
compoundComponentSurfaceForms
|
Indicates whether to return the surface forms of compound components. When this option is enabled and ADM results are returned, getText returns the surface form of a component Token , and its lemma can be retrieved using Token#getAnalyses() and MorphoAnalysis#getLemma() . When this option is enabled and the results are not in ADM format, getCompoundComponentSurfaceForms returns the surface forms of a compound word’s Analysis , and its surface form is not available.
This option has no effect when decomposeCompounds is set to false .
|
Boolean
(false)
|
Dutch, German, Hungarian |
The following options are described in more detail in Disambiguation.
Table 7. Disambiguation Options
Option |
Description |
Type (Default) |
Supported Languages |
disambiguate
|
Indicates whether the analyzers should disambiguate the results. |
Boolean
(true)
|
Arabic, Chinese, Czech, Dutch, English, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish |
alternativeEnglishDisambiguation
|
Enables faster part of speech disambiguation for English. |
Boolean
(false)
|
English |
alternativeGreekDisambiguation
|
Enables faster part of speech disambiguation for Greek |
Boolean
(false)
|
Greek |
alternativeSpanishDisambiguation
|
Enables faster part of speech disambiguation for Spanish. |
Boolean
(false)
|
Spanish |
The following options are described in more detail in Returning Universal Part-of-Speech (POS) Tags.
Table 8. Universal POS Tag Options
Option |
Description |
Type
(Default)
|
Supported Languages |
universalPosTags
|
Indicates if POS tags should be converted to unversal versions |
Boolean
(false)
|
POS tags are defined for Arabic, Chinese, Czech, Dutch, English, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Russian, Spanish, and Urdu. |
customPosTagsUri
|
URI of a POS tag map |
URI |
POS tags are defined for Arabic, Chinese, Czech, Dutch, English, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Russian, Spanish, and Urdu. |
The following options are described in more detail in Contraction Splitting Rule File Format.
Table 9. Contraction Splitting Options
Option |
Description |
Type
(Default)
|
Supported Languages |
tokenizeContractions
|
Indicates whether to deliver contractions as multiple tokens. If false , they are delivered a a single token. |
Boolean
(false)
|
All |
customTokenizeContractionRulesUri
|
URI of contraction rule file. |
URI |
All |
The following options are only available when using the ADM API.
Table 10. Annotator Object Options
Option |
Description |
Type (Default) |
Supported Languages |
analyze
|
Enables analysis. If false, the annotator will only perform tokenization. |
Boolean
(true )
|
All |
customPosTagsUri
|
URI of a POS tag map file for use by the univeralPosTags option. |
Boolean
(true )
|
Czech, Danish, Dutch, English, French, German, Greek, Hebrew, Hungarian, Italian, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish |