Basis Technology software
Your installation of REX will include the following files:
The SDK package:
<VERSION> is the version of REX you are installing, e.g.,
rex-je-7.28.1.c59.0.zip. When you unzip the SDK package, the root directory is
rex-je-<VERSION>. It contains the following subdirectories:
data Contains the statistical models, gazetteers, regexes, and configuration files for the redactor and joiner. You can add regex files and text gazetteers to the data tree, and edit the configuration files.
lib Contains the .jar files that the SDK uses. Add these .jar files to your classpath.
licenses Default location for your Rosette license. The default configuration and samples require
rlp-license.xml to be in this directory.
samples Contains sample data and applications that you can examine, modify, and run.
samples/lib contains third-party .jar files that the samples use.
As of version
rex-je-7.41.0.c60.0., the SDK package contains a separate zip file for each language:
rex-je-7.41.0.c60.0-eng.zip for the English language,
rex-je-7.41.0.c60.0-deu.zip for the German language, etc. In order to obtain REX functionality in your languages of interest, you can unzip each of the provided language zip files into the SDK's root directory
The documentation package:
rex-je-<VERSION>-doc.zip When you unzip the documentation package, the root directory contains a
doc subdirectory with the REX Application Developer’s Guide (this document,
rex-je-appdev-guide.pdf), the Release Notes (
rex-je-<VERSION>-release-notes.html) and the Java API documentation (see
The Rosette License:
rlp-license.xml. Before you use REX, copy this file to the
As of version
rex-je-7.41.0.c60.0., entity linking (linker processor) is provided within the standard REX SDK package:
rex-je-<VERSION>.zip. No other files are required. See Entity Linking.
You may also receive packages for customization tools, to train the statistical processor.
Third party software:
Java SDK 1.8 Required to use REX.
(Optional) Ant 1.8.2 or later
Required to build and run the samples we provide. See Installing Ant for installation help.
A Quick Look at REX: Running the Sample Program
After you unzip the SDK and documentation packages and copy
rlp-license.xml to the
licenses subdirectory, try running the sample application.
In a Bash shell (Unix) or Command Prompt (Windows), navigate to rex-je-<VERSION>/samples.
Use the Ant build script to compile and
EntityAnnotatorSample instantiates an Annotator to process a UTF-8 input file and report the entities it finds in an output file.
The sample reads an input file in rex-je-<VERSION>/samples/data:
General George Washington (February 22, 1732 – December 14, 1799) was the
dominant military and political leader of the new United States of America
from 1775 to 1799. He led the American victory over Britain in the American
Revolutionary War as commander in chief of the Continental Army in 1775–1783,
and he presided over the writing of the Constitution in 1787.
The sample writes output to
rex-je-<VERSION>/samples/EntityAnnotatorSampleOut-eng.txt. For each entity it finds, the output includes entity type, offsets for the location of the entity in the input document, the normalized entity (1 space between each word), and the source (statistical, regex, gazetteer, or joiner).
TITLE, [0, 7), General (statistical)
PERSON, [8, 25), George Washington (statistical)
LOCATION, [124, 148), United States of America (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
NATIONALITY, [179, 187), American (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
LOCATION, [201, 208), Britain (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
NATIONALITY, [216, 224), American (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
TITLE, [246, 264), commander in chief (statistical)
ORGANIZATION, [272, 288), Continental Army (statistical)
IDENTIFIER:URL, [374, 420), http://en.wikipedia.org/wiki/George_washington (regex:xxx_9)
To process a sample document in a different language, include the ISO 639-3 Language Code when you call Ant. For example, to process a German document:
ant -Dlang=deu run.EntityAnnotatorSample
The source for this sample is in
Ant provides arguments for the root directory, a language code (eng), an input file, and an output file. The root directory provides the path to the Rosette license and to the data tree. The data tree includes the statistical model, default gazetteers, default regex files, and a redaction configuration file.
The sample processes an input file and reports information about each entity that it extracts from the input.