To use RBL-Elasticsearch, you need Elasticsearch as well as the plugin that corresponds to your Elasticsearch version. Elastic requires that plugins be built specifically for each version they release. This plugin integrates Rosette Base Linguistics Java Edition 7.37.0.c62.2 into Elasticsearch 7.6.1.
-
If you do not already have it, install Elasticsearch
Download and unzip the appropriate archive.
-
Install the RBL plugin.
Navigate to the elasticsearch-<version>
root directory and run:
bin/elasticsearch-plugin install file:/path/to/rbl-je-elasticsearch-<version>.zip
At the end of the installation process, Elasticsearch will display the following warning:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission loadLibrary.*
* java.lang.RuntimePermission setContextClassLoader
* java.lang.reflect.ReflectPermission suppressAccessChecks
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]
Enter y
to complete the installation. uses the accessDeclaredMembers permission to parse its configuration files and the other permissions to run TensorFlow.
The plugin is now in plugins/analysis-rbl-je
.
-
Copy the RLP License (rlp-license.xml
) to plugins/analysis-rbl-je/rbl-je-<version>/licenses
.
This license determines in which languages you are authorized to index and query documents with RBL-ES.
To start the Elasticsearch search server, run
The tools directory contains files for:
You can delete some or all of the directories, as needed. For example, you may decide to delete the user dictionary directories, but keep the RBLCmd utility. If you don't need to build these dictionaries or use the RBL command line utility, you may freely delete the entire tools
directory.
RBLCmd
is a general-purpose command line utility for RBL. It provides a simple way to produce RBL output without writing code. It is also useful for ad hoc speed and thread testing.
A Bash shell script (RBLCmd
) and Windows script (RBLCmd.bat
) for running this utility are in rbl-je-<version>/tools/bin
. For more information, see RBLCmd
's on-line help, RBLCmd -h
.
The command:
echo 'Hola' | ./tools/bin/RBLCmd -outputJson --language spa --rootDirectory . | jq
produces the following output:
{
"version": "1.1.0",
"data": "Hola\n",
"attributes": {
"sentence": {
"type": "list",
"itemType": "sentence",
"items": [
{
"startOffset": 0,
"endOffset": 5
}
]
},
"scriptRegion": {
"type": "list",
"itemType": "scriptRegion",
"items": [
{
"startOffset": 0,
"endOffset": 5,
"script": "Latn"
}
]
},
"layoutRegion": {
"type": "list",
"itemType": "layoutRegion",
"items": [
{
"startOffset": 0,
"endOffset": 5,
"layout": "STRUCTURED"
}
]
},
"token": {
"type": "list",
"itemType": "token",
"items": [
{
"startOffset": 0,
"endOffset": 4,
"text": "Hola",
"analyses": [
{
"partOfSpeech": "INTERJ",
"lemma": "hola",
"raw": "hola[+INTERJ]",
"tagSet": "BT_SPANISH"
}
]
}
]
}
},
"documentMetadata": {}
}
System Requirements for TensorFlow
RBL uses TensorFlow, a native library, for disambiguating Hebrew when using a neural network for the disambiguator. Ubuntu 14.04+, Windows 7+, and macOS 10.11+ are fully supported, but you should be able to run the disambiguator successfully on other modern Linux flavors as well.
The TensorFlow library for Linux requires a system with:
libc.so.6
(GLIBC
) 2.17 or newer
libstdc++.so.6
(GLIBCXX
) 3.4.19 or newer
libgcc_s.so.1
(GCC
) 3.0 or newer.
The version of TensorFlow included with RBL is configured to work on a broad range of systems, which requires it to avoid some features not available everywhere. Compiling TensorFlow for your specific system and using it on the classpath instead of the default libtensorflow_jni-<tensorflowversion>.jar
will likely improve performance. A JAR with TensorFlow compiled with GPU support for Linux is available for download.
The neural network is used for Hebrew disambiguation when the disambiguatorType
is set to DNN
.