Minimum System Requirements
The amount of disk space required depends on your use case and the languages installed.
x86_64 CPU with 4 or more physical cores
Minimum 16 GB RAM
64-bit JDK 8 or 11 (tested with OpenJDK)
Ant 1.8.2 or later (optional - required to run included samples)
Your installation of REX will include the following files:
The SDK package:
<VERSION> is the version of REX you are installing, e.g.,
rex-je-7.28.1.c59.0.zip. When you unzip the SDK package, the root directory is
rex-je-<VERSION>. It contains the following subdirectories:
Contains the statistical models, gazetteers, regexes, and configuration files for the redactor and joiner. You can add regex files and text gazetteers to the data tree and edit the configuration files.
Contains the .jar files that the SDK uses. Add these .jar files to your classpath.
Default location for your Rosette license. The default configuration and samples require
rlp-license.xml to be in this directory.
Contains sample data and applications to examine, modify, and run.
samples/lib contains the third-party jar files used by the samples.
- language files
Language files for each language . The files are named
rex-je-7.41.0.c60.0-<LANGUAGE>.zip, where LANGUAGE is the three-letter ISO 639-3 code indicating the language of the file contents. For example,
rex-je-7.41.0.c60.0-eng.zip is the file for the English language. The files are unzipped into the SDK's root directory,
rex-je-<VERSION>-installer-zip to unpackage the SDK package and language packs. Unzip the installer package and run
install-rex.sh to begin. The installer will prompt you for:
The location of the other packages if they're not detected in the current directory.
The language packs you want to install. The installer defaults to all detected packs.
The installation directory (defaults to current).
It will then unzip all necessary files to their correct locations.
The installer does not copy in the license file.
The Rosette License:
rlp-license.xml. Copy this file to the
The documentation package:
rex-je-<VERSION>-doc.zip When unzipped, the root directory contains a
doc subdirectory with the following components:
REX Application Developer’s Guide (this document,
Release Notes (
Java API documentation (
Entity linking (the linker processor) is provided within the standard REX SDK package. No other files are required for linking.
You may also receive packages for field training kit, to train the statistical processor.
A Quick Look at REX: Running the Sample Program
After you install REX and the license file, try running the sample application. The sample processes an input file and reports information about each entity that it extracts from the input.
Ant provides arguments for the root directory, a language code (eng), an input file, and an output file. The root directory provides the path to the Rosette license and to the data tree. The data tree includes the statistical model, default gazetteers, default regex files, and a redaction configuration file.
In a Bash shell (Unix) or Command Prompt (Windows), navigate to rex-je-<VERSION>/samples.
Use the Ant build script to compile and run
EntityAnnotatorSample instantiates an Annotator to process a UTF-8 input file and report the entities it finds in an output file.
The sample reads an input file in rex-je-<VERSION>/samples/data:
General George Washington (February 22, 1732 – December 14, 1799) was the
dominant military and political leader of the new United States of America
from 1775 to 1799. He led the American victory over Britain in the American
Revolutionary War as commander in chief of the Continental Army in 1775–1783,
and he presided over the writing of the Constitution in 1787.
The sample writes output to
rex-je-<VERSION>/samples/EntityAnnotatorSampleOut-eng.txt. For each entity it finds, the output includes entity type, offsets for the location of the entity in the input document, the normalized entity (1 space between each word), and the source (statistical, regex, gazetteer, or joiner).
TITLE, [0, 7), General (statistical)
PERSON, [8, 25), George Washington (statistical)
LOCATION, [124, 148), United States of America (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
NATIONALITY, [179, 187), American (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
LOCATION, [201, 208), Britain (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
NATIONALITY, [216, 224), American (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
TITLE, [246, 264), commander in chief (statistical)
ORGANIZATION, [272, 288), Continental Army (statistical)
IDENTIFIER:URL, [374, 420), http://en.wikipedia.org/wiki/George_washington (regex:xxx_9)
To process a sample document in a different language, include the ISO 639-3 Language Code when you call Ant. For example, to process a German document:
ant -Dlang=deu run.EntityAnnotatorSample
The source for this sample is in