Minimum System Requirements
RBL is a Java SDK and works on any system with Java, including OpenJDK, installed.
The minimum Java heap required to run RBL smoothly is 1.5 GB. It has been observed that a -Xmx
setting of 8 GB allows for optimal performance in significantly multithreaded environments. This memory setting accounts for the simultaneous use of multiple languages.
We also recommend that you set -Xms
equal to your -Xmx
setting. This prevents the JVM from having to grow the heap, which is time-consuming.
System Requirements for TensorFlow
RBL uses TensorFlow, a native library, when using a neural network. Ubuntu 14.04+, Windows 7+, and macOS 10.11+ are supported, but you should be able to run TensorFlow successfully on other modern Linux flavors as well. TensorFlow for Java does not yet support the Apple M1 or M2 chip sets.
The TensorFlow library for Linux requires a system with:
-
libc.so.6
(GLIBC
) 2.17 or newer
-
libstdc++.so.6
(GLIBCXX
) 3.4.19 or newer
-
libgcc_s.so.1
(GCC
) 3.0 or newer.
The version of TensorFlow included with RBL is configured to work on a broad range of systems, which requires it to avoid some features not available everywhere. Compiling TensorFlow for your specific system and using it on the classpath instead of the default libtensorflow_jni-<tensorflowversion>.jar
will likely improve performance. The version of TensorFlow included supports CUDA on Windows and Linux. Our neural models may perform better after installing CUDA on machines with supported GPUs.
Your installation of RBL will include the following files:
-
The SDK package: rbl-je-<version>.zip
, where <version> is the version of RBL you are installing, e.g. rbl-je-7.36.0.c62.2.zip
. When you unzip the SDK package, the root directory is rbl-je-<version>
. It contains text files with license and copyright information, along with the following subdirectories:
- dicts
-
RBL binary dictionaries.
- lib
-
The JAR files that the SDK uses.
The core SDK .jar files are btrbl-je-<version>.jar
and btcommon-api-<btcommonversion>.jar
. The SDK also uses the Simple Logging Facade for Java (SLF4J), slf4j-api-<slf4jversion>.jar
. You should add these .jar files to your classpath.
If you are using Lucene or Solr, you should also add the appropriate JAR file to your classpath. The versions are dependent on the version of Lucene/Solr you are using.
- licenses
-
Default location for placing your license file, rlp-license.xml
. The samples that accompany this SDK require rlp-license.xml
to be in this directory.
- models
-
RBL binary models.
- samples
-
RBL sample files.
Sample text and query files in all supported languages are in samples/data
. These files are used with the code samples.
See Running the Core Samples and JapaneseAnalyzerSample.
- tools
-
Tools for generating user dictionaries and the RBLCmd line utility.
-
The Documentation: rbl-je-<version>-doc.zip
When you unzip the documentation package to the same location where you have unzipped the SDK package, the root directory contains a doc
subdirectory containing:
-
Rosette Base Linguistics Application Developer's Guide (this document, rbl-je-<version>-appdev-guide.pdf
)
-
Release Notes (rbl-je-<version>-release-notes.pdf
)
-
Java API documentation (apidocs/index.html
)
-
The license file: rlp-license.xml
.
The samples expect to find it in rbl-je-<version>/licenses
.
RBL contains version-specific files for the Lucene and Solr integrations. These files support multiple versions of Lucene/Solr, as indicated below.
RBL supports Lucene versions 7.0 - 9.7 and Solr versions 7.0 - 9.3 with the following files, where <version> is the RBL version:
RBL uses the Logging Facade for Java (SLF4J) to log activities. The SLF4J API JAR is in lib/
.
SFL4J is a facade for various logging APIs. Using SFL4J, the developer or an administrator can determine which one of many popular logging systems to use at runtime. In the tools/lib
directory, we include the SLF4J binding JARs:
These JARs are used by our samples and RBLCmd
. When you place these JARs on your classpath, the logging facade is bound to the implementation, and RBL logging is turned on. As defined in the file etc/log4j2.properties
, by default, WARN
messages are output to the console (System.err
). As the Javadoc for org.slf4j.impl.SimpleLogger
explains, you can use system properties or this properties file to output to a file and control other logging parameters.
If you want to use SLF4J with a different implementation, put the appropriate binding JAR files and properties file on your classpath.
Removing Unnecessary Files
Depending on the scope of your application, you may wish to remove unnecessary files to reduce the size of your application.
The tools directory contains files for:
You can delete some or all of the directories, as needed. For example, you may decide to delete the user dictionary directories, but keep the RBLCmd utility. If you don't need to build these dictionaries or use the RBL command line utility, you may freely delete the entire tools
directory.
Language-Specific Model and Dictionary Files
You can remove files that represent languages your application does not need to support. Some languages require files for other language codes, either because they are canonicalized to the other language, or because there is some internal RBL requirement. Some languages require the files of more than one language.
When removing language files, be sure to check the deletion rules for the language and keep the files for all required language codes.
Table 1. Dependent Languages
Language
|
Language Code
|
Required Language Code(s)
|
Chinese (Simplified)
|
zhs
|
zho
|
Chinese (Traditional)
|
zht
|
Korean
|
kor
|
kor , eng
|
Norwegian
|
nor
|
nob
|
Norwegian Bokmål
|
nob
|
Persian, Afghan
|
prs
|
fas
|
Western Persian, Western
|
pes
|
Russian
|
rus
|
rus , eng
|
Tagalog
|
tgl
|
tgl , eng
|
Additionally, if your distribution platform is of a particular endianness, you can remove the models of the opposite endianness. When applicable, the endianness of a file is given at the end of the file name; for example, the file root/dicts/ara/dictLemmas-LE.bin
is a little-endian binary storing the Arabic lemma dictionary, whereas root/dicts/ara/dictLemmas-BE.bin
is the same but stored in big-endian format.
-
root/dicts
: Any directory named after a language code (or “csc” for the Chinese Script Converter) and all its contents are used only for that language, and any file with “BE” or “LE” in its name is only used on big- or little-endian systems, respectively.
-
root/models
: Any directory named after a language code and all its contents are used only for that language.
-
root/contractions
: Any .yaml
file whose name ends with a language code is used only for that language.
-
root/upt-16
: Any .yaml
file whose name ends with a language code is used only for that language.
The files required for Japanese and Chinese depend on the value of the tokenizerType
option, as shown in the table below.
The JAR jakarta.annotation-api-<version>.jar
is included in RBL but cannot be shaded. There may be conflicts if this JAR is used elsewhere in your processing. The JAR is used in disambiguation of Arabic, English, Greek, Hebrew, Japanese, Korean, and Spanish. If you are not using this functionality, the JAR can be removed.