rootDirectory
|
A REX root directory contains language models and necessary configuration files.
|
${rex-root}
|
rblRootDirectory
|
The directory containing the RBL root for REX to use.
|
${rex-root}/rbl-je
|
rejectGazetteers
|
Additional gazetter files used to reject entities for the given language.
|
null
|
snapToTokenBoundaries
|
Regular expressions and gazetteers may be configured to match tokens partially independent from token boundaries. If true, reported offsets correspond to token boundaries.
|
true
|
rejectRegularExpressionSets
|
Additional regex files used to reject entities.
|
null
|
allowPartialGazetteerMatches
|
The option to allow partial gazetteer matches. For the purposes of this setting, a partial match is one that does not line up with token boundaries as determined by the internal tokenizer. This only applies to accept gazetteers.
|
false
|
kbs
|
Custom list of Knowledge Bases for the linker, in order of priority
|
null
|
maxEntityTokens
|
The maximum number of tokens allowed in an entity returned by Statistical Entity Extractor. Entity Redactor discards entities from Statistical Entity Extractor with more than this number of tokens.
|
8
|
customProcessors
|
Custom processors to add to annotators.
|
null
|
acceptRegularExpressionSets
|
Additional files used to produce regex entities.
|
null
|
calculateConfidence
|
If true, entity confidence values are calculated. Can be overridden by specifying calculateConfidence in the API call.
|
false
|
joinerRuleFiles
|
File containing additional joiner rules.
|
null
|
excludedEntityTypes
|
Entity types to be excluded from extraction.
|
null
|
dataOverlayDirectory
|
An overlay directory is a directory shaped like the data directory. REX will look for files in both the overlay directory and the root directory, using files from both locations. However, if a file exists in both places (as identified by its path relative to the overlay or root data directory), REX prefers the version in the overlay directory. If REX finds a zero-length file in the overlay directory, it ignores both that file and any corresponding file in the root data directory.
|
null
|
confidenceThreshold
|
The confidence value threshold below which entities extracted by the statistical processor are ignored.
|
-1.0
|
resolvePronouns
|
When true, resolve pronouns to person entities.
|
false
|
statisticalModels
|
Additional files used to produce statistical entities for the given language.
You may pass multiple statistical models. The parameter should be formatted in trios of values specfying language, case-sensitivity and the model file, separated by commas. Case-sensitivity can be automatic , caseInsensitive or caseSensitive . For example, setting two models for case-sensitive English and Japanese might look like : eng,caseSensitive,english-model.bin,jpn,automatic,japanese-model.bin
|
null
|
acceptGazetteers
|
Additional gazetteer files used to produce entities for the given language.
|
null
|
linkEntities
|
The option to link mentions to knowledge base entities with disambiguation model. Enabling this option also enables calculateConfidence .
|
false
|
caseSensitivity
|
The capitalization (aka 'case') used in the input texts. Processing standard documents requires caseSensitive, which is the default. Documents with all-caps, no-caps or headline capitalization may yield higher accuracy if processed with the caseInsensitive value.
Can be automatic , caseSensitive or caseInsensitive
|
caseSensitive
|
maxResolvedEntities
|
The maximum number of entities for in-document coreference resolution (a.k.a. chaining).
|
2000
|
calculateSalience
|
If true, entity chain salience values are calculated. Can be overridden by specifying calculateSalience in the API call.
|
false
|
retainSocialMediaSymbols
|
The option to retain social media symbols ('@' and '#') in normalized output
|
false
|
statSalienceMode
|
An option to calculate entity-chain salience with statistical-based calculation (returns 0 or 1) or simple calculation (returns score between 0 and 1)
|
true
|
customProcessorClasses
|
Register a custom processor class.
|
null
|
keepEntitiesInInput
|
The option to keep existing annotated text entities.
|
false
|
redactorPreferLength
|
The option to prefer length over weights during redaction. If true, the redactor will always choose a longer entity over a shorter one if the two overlap, regardless of their user-defined weights. In this case, if the lengths are the same, then weight is used to disambiguate the entities. If false, the redactor will choose the higher weighted entity when two overlap, regardless of the length of the entity string. In this case, if the weights are the same, then the redactor will choose the longer of the two entities.
|
true
|
useDefaultConfidence
|
The option to assign default confidence value 1.0 to non-statistical entities instead of null.
|
false
|
linkingConfidenceThreshold
|
The confidence value threshold below which linking results by the kbLinker processor are ignored.
|
-1.0
|
indocType
|
An option for document entity resolution (also known as entity chaining). Valid values are: HIGH , STANDARD , STANDARD_MINUS or NULL
|
STANDARD
|
Default processors: acceptGazetteer, acceptRegex, rejectGazetteer, rejectRegex, statistical indocCoref, redactor, joiner processors
|
List the set of active processors for an entity extraction run. All processors are active by default. This method provides a way to turn off selected processors. The order of the processors cannot be changed. Note that turning off redactor can cause overlapping and unsorted entities to be returned.
|
null
|
supplementalRegularExpressionPaths
|
The option to add supplemental regex files, usually for entity types that are excluded by #default. The supplemental regex files are located at data/regex/<lang>/accept/supplemental and are not used unless specified.
|
null
|
structuredRegionsProcessingType
|
Configures how structured regions will be processed. It has three values: none , nerModel , and nameClassifier .
|
none
|
regexCurrencySplit
|
Determines if money values should be extracted as MONEY or CURRENCY_AMT and CURRENCY_TYPE . If true, REX tries to extract CURRENCY instead of MONEY .
|
false
|