Minimum System Requirements
Important
The amount of disk space required depends on your use case and the languages installed.
-
x86_64 CPU with 4 or more physical cores
-
Minimum 16 GB RAM
-
Disk Space
-
64-bit JDK 11 or 17 (tested with OpenJDK)
-
Ant 1.8.2 or later (optional - required to run included samples)
Your installation of REX will include the following files:
-
The SDK package: rex-je-<VERSION>.zip
, where <VERSION>
is the version of REX you are installing, e.g., rex-je-7.28.1.c59.0.zip
. When you unzip the SDK package, the root directory is rex-je-<VERSION>
. It contains the following subdirectories:
- data
-
Contains the statistical models, gazetteers, regexes, and configuration files for the redactor and joiner. You can add regex files and text gazetteers to the data tree and edit the configuration files.
- lib
-
Contains the .jar files that the SDK uses. Add these .jar files to your classpath.
- licenses
-
Default location for your Rosette license. The default configuration and samples require rlp-license.xml
to be in this directory.
- samples
-
Contains sample data and applications to examine, modify, and run. samples/lib
contains the third-party jar files used by the samples.
- language files
-
Language files for each language
. The files are named rex-je-7.41.0.c60.0-<LANGUAGE>.zip
, where LANGUAGE is the three-letter ISO 639-3 code indicating the language of the file contents. For example, rex-je-7.41.0.c60.0-eng.zip
is the file for the English language. The files are unzipped into the SDK's root directory, rex-je-<VERSION>
.
-
An installer: rex-je-<VERSION>-installer-zip
to unpackage the SDK package and language packs. Unzip the installer package and run install-rex.sh
to begin. The installer will prompt you for:
-
The location of the other packages if they're not detected in the current directory.
-
The language packs you want to install. The installer defaults to all detected packs.
-
The installation directory (defaults to current).
It will then unzip all necessary files to their correct locations.
Note
The installer does not copy in the license file.
-
The Rosette License: rlp-license.xml
. Copy this file to the licenses
subdirectory.
-
The documentation package: rex-je-<VERSION>-doc.zip
When unzipped, the root directory contains a doc
subdirectory with the following components:
-
REX Application Developer’s Guide (this document, rex-je-appdev-guide.pdf
)
-
Release Notes (rex-je-<VERSION>-release-notes.pdf)
-
Java API documentation (apidocs/index.html
).
Entity linking (the linker processor) is provided within the standard REX SDK package. No other files are required for linking.
You may also receive packages for field training kit, to train the statistical processor or add a custom knowledge base.
A Quick Look at REX: Running the Sample Program
After you install REX and the license file, try running the sample application. The sample processes an input file and reports information about each entity that it extracts from the input.
Ant provides arguments for the root directory, a language code (eng), an input file, and an output file. The root directory provides the path to the Rosette license and to the data tree. The data tree includes the statistical model, default gazetteers, default regex files, and a redaction configuration file.
-
In a Bash shell (Unix) or Command Prompt (Windows), navigate to rex-je-<VERSION>/samples.
-
Use the Ant build script to compile and run EntityAnnotatorSample
ant run.EntityAnnotatorSample
EntityAnnotatorSample
instantiates an Annotator to process a UTF-8 input file and report the entities it finds in an output file.
-
The sample reads an input file in rex-je-<VERSION>/samples/data:
General George Washington (February 22, 1732 – December 14, 1799) was the
dominant military and political leader of the new United States of America
from 1775 to 1799. He led the American victory over Britain in the American
Revolutionary War as commander in chief of the Continental Army in 1775–1783,
and he presided over the writing of the Constitution in 1787.
Source: http://en.wikipedia.org/wiki/George_washington
-
The sample writes output to rex-je-<VERSION>/samples/EntityAnnotatorSampleOut-eng.txt.
For each entity it finds, the output includes entity type, offsets for the location of the entity in the input document, the normalized entity (1 space between each word), and the source (statistical, regex, gazetteer, or joiner).
TITLE, [0, 7), General (statistical)
PERSON, [8, 25), George Washington (statistical)
LOCATION, [124, 148), United States of America (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
NATIONALITY, [179, 187), American (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
LOCATION, [201, 208), Britain (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
NATIONALITY, [216, 224), American (gazetteer:/pathto/data/gazetteer/eng/accept/gaz-LE.bin)
TITLE, [246, 264), commander in chief (statistical)
ORGANIZATION, [272, 288), Continental Army (statistical)
IDENTIFIER:URL, [374, 420), http://en.wikipedia.org/wiki/George_washington (regex:xxx_9)
To process a sample document in a different language, include the ISO 639-3 Language Code when you call Ant. For example, to process a German document:
ant -Dlang=deu run.EntityAnnotatorSample
The source for this sample is in rex-je-<VERSION>/samples/src/EntityAnnotatorSample.java
.
Removing Extra Language Files
The REX release package includes your licensed language models. If you won't be using all of your licensed languages, you can repackage REX so that only desired language models are included.
Files in the following directories are common to all languages and required:
bin/*
lib/*
data/etc/*
data/etc/regex/xxx/* (Language-neutral regex)
data/flinx/data/kb/basis/ (Default linker knowledge base, files that aren't listed below are common)
licenses/*
rbl-je/*
Each language requires files in the following directories:
data/gazetteer/{lang}/accept/gaz-LE.bin
data/regex/{lang}/*
data/statistical/{lang}/*
data/flinx/data/lang-vectors (Linker vectors for specific languages go here)
data/flinx/data/kb/basis/{lang} (Default linker knowledge base models for specific languages)
data/flinx/data/kb/basis/etc (Default linker knowledge base vectors for specific languages go here)
data/flinx/data/kb/basis/{xxxx}-aliases.bin (Default linker knowledge base data for specific scripts)
To create minimal package for the relevant languages, run the python script bin/repack-rex.py
with python 2.7 or later.
% python repack-rex.py
Copyright (c) 2017 Basis Technology Corporation All Rights Reserved.
Support@basistech.com
http://www.basistech.com
This script repacks REX for languages specified in the script arguments.
usage: rex-distro-zip output-zip rlp-license-file lang [lang ...]
To create English-only REX distribution package zip:
python repack-rex.py downloaded-rex-distro.zip eng-only.zip rlp-license.xml eng