After you install RBL and the license file, try running a sample application. RBL contains a samples directory rbl-je-<version>/samples
.
-
At a command prompt, navigate to rbl-je-<version>/samples/<sampleName>
.
-
Use the Ant build script to compile and run the samples.
ant run
If the option rootDirectory
is specified, then the string ${rootDirectory}
takes that value in the dictionaryDirectory
, modelDirectory
, and licensePath
options.
Table 2. Initial and Path Options
Option
|
Description
|
Type (Default)
(Default)
|
Supported Languages
|
dictionaryDirectory
|
The path of the lemma and compound dictionary, if it exists.
|
Path
${rootDirectory}/dicts
|
All
|
language
|
The language to process by analyzers or tokenizers created by the factory.
|
Language code
|
All
|
licensePath
|
The path of the RBL license file.
|
Path
${rootDirectory}/licenses/rlp-license.xml
|
All
|
licenseString
|
The XML license content, overrides licensePath .
|
String
|
All
|
modelDirectory
|
The directory containing the model files.
|
Path
${rootDirectory}/models
|
All
|
rootDirectory
|
Set the root directory. Also sets default values for other required options (dictionaryDirectory , licensePath , licenseString , and modelDirectory ).
|
Path
|
All
|
Enum Classes:
A sample application that illustrates the use of ADM is in rbl-je-<version>/samples/annotator-tokenize
.
-
In a Bash shell (Unix) or command prompt (Windows), navigate to rbl-je-<version>/samples/annotator-tokenize
.
-
Use the Ant build script to compile and run the sample.
ant run
Your license (rlp-license.xml
) must be in the licenses
subdirectory of the RBL installation.
AnnotatorTokenize
tokenizes the English string and provides one or more analyses with lemma and part-of-speech for each token.
The output appears in annotator-tokenize.txt
.
length: 29
------
Some members spoke yesterday.
------
token 0: Some
index lemma part-of-speech
0 some QUANT
token 1: members
index lemma part-of-speech
0 member NOUN
token 2: spoke
index lemma part-of-speech
0 speak VPAST
1 spoke VI
2 spoke VPRES
3 spoke NOUN
token 3: yesterday
index lemma part-of-speech
0 yesterday ADV
1 yesterday NOUN
token 4: .
index lemma part-of-speech
0 . SENT
Classic API Sample Application
A sample application illustrating the use of the classic API is in rbl-je-<version>/samples/tokenize-analyze
.
-
In a Bash shell (Unix) or command prompt (Windows), navigate to rbl-je-<version>/samples/tokenize-analyze
.
-
Use the Ant build script to compile and run the sample.
ant run
Your license (rlp-license.xml
) must be in the licenses
subdirectory of the RBL installation.
TokenizeAnalyze
tokenizes the sample German document and provides a disambiguated analysis of each token.
The output appears in two files: deu-tokenized.txt
and deu-analyzed.txt
. The first file contains a token on each line, with a blank line following the end of a sentence.
The second file contains the token, lemma, part of speech, and compound components (where relevant) on each line. For those languages for which disambiguation is not supported, there may be multiple rows for each token (the token appearing in the first column), one for each analysis. Here is a fragment with a sentence from deu-analyzed.txt
:
TOKEN LEMMA POS COMPOUNDS
----- ----- --- ---------
3.11.06 3.11.06 CARD
- - PUNCT
Not Not NOUN
und und COORD
Elend Elend NOUN
in in PREP
ihren ihr POSDET
Heimatländern Heimatland NOUN [Heimat, Land]
lassen lassen VVFIN
immer immer ADV
mehr mehr INDADJ
Afrikaner Afrikaner NOUN
die der ART
Reise Reise NOUN
nach nach PREP
Europa Europa NOUN
antreten antreten VVINF
. . SENT
To run the samples with sample text in a different language, set the test.language
parameter with the language code. For example to tokenize and analyze the Spanish sample, call
ant -Dtest.language=spa run
RBLCmd
is a general-purpose command line utility for RBL. It provides a simple way to produce RBL output without writing code. It is also useful for ad hoc speed and thread testing.
A Bash shell script (RBLCmd
) and Windows script (RBLCmd.bat
) for running this utility are in rbl-je-<version>/tools/bin
. For more information, see RBLCmd
's on-line help, RBLCmd -h
.
The command:
echo 'Hola' | ./tools/bin/RBLCmd -outputJson --language spa --rootDirectory . | jq
produces the following output:
{
"version": "1.1.0",
"data": "Hola\n",
"attributes": {
"sentence": {
"type": "list",
"itemType": "sentence",
"items": [
{
"startOffset": 0,
"endOffset": 5
}
]
},
"scriptRegion": {
"type": "list",
"itemType": "scriptRegion",
"items": [
{
"startOffset": 0,
"endOffset": 5,
"script": "Latn"
}
]
},
"layoutRegion": {
"type": "list",
"itemType": "layoutRegion",
"items": [
{
"startOffset": 0,
"endOffset": 5,
"layout": "STRUCTURED"
}
]
},
"token": {
"type": "list",
"itemType": "token",
"items": [
{
"startOffset": 0,
"endOffset": 4,
"text": "Hola",
"analyses": [
{
"partOfSpeech": "INTERJ",
"lemma": "hola",
"raw": "hola[+INTERJ]",
"tagSet": "BT_SPANISH"
}
]
}
]
}
},
"documentMetadata": {}
}