Use our analyzer or create your own analysis chain.
Our analyzer includes our tokenizer and base linguistics token filter followed by Lucene's LowerCaseFilter
, CJKWidthFilter
(if the language is Chinese, Japanese, or Korean), and StopFilter
(if the stop words
setting is included). Here are example settings for using it.
{
"settings": {
"analysis": {
"analyzer": {
"rbl_default": {
"type": "rbl",
"language": "spa",
"stopwords": "_spanish_"
}
}
}
}
}
The tokenizer divides the text into individual words (tokens), and the token filter generates lemmas (dictionary forms) for each token.
You can create your own analysis chain to control the components used. Here is an example illustrating how to set up a custom analysis chain. These settings will produce the same results as the default analyzer settings above.
{
"settings": {
"analysis": {
"analyzer": {
"rbl_custom_spa": {
"type": "custom",
"tokenizer": "rbl_spa_t",
"filter": [
"rbl_spa_f",
"lowercase",
"spanish_stop"
]
}
},
"tokenizer": {
"rbl_spa_t": {
"type": "rbl",
"language": "spa"
}
},
"filter": {
"rbl_spa_f": {
"type": "rbl",
"language": "spa"
},
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
}
}
}
}
}
The following snippets use the cURL command-line tool to illustrate the Elasticsearch API for running the plugin.
Create an Elasticsearch index with an associated base linguistics analyzer.
The index will contain Japanese documents, and the analyzer is the Japanese base linguistics analyzer
curl -XPUT 'http://localhost:9200/rbl-test-jpn' -H'Content-Type: application/json' -d '{
"settings": {
"analysis": {
"analyzer": {
"my_rbl": {
"type": "rbl",
"language": "jpn"
}
}
}
}
}'
The following statement replicates the previous statement by creating an analysis chain with a tokenizer and base linguistics token filter (equivalent to our analyzer).
curl -XPUT 'http://localhost:9200/rbl-test-jpn/' -H'Content-Type: application/json' -d '{
"settings":{
"analysis":{
"analyzer" : {
"my_rbl" : {
"type" : "rbl",
"tokenizer" : "rbl",
"language" : "jpn",
"filter" : ["rbl"]
}
},
"tokenizer" : {
"rbl": {
"type" : "rbl",
"language" : "jpn"
}
},
"filter" : {
"rbl" : {
"type" : "rbl",
"language" : "jpn",
"addLemmaTokens" : "true"
}
}
}
}
}'
Specify the document type and index field that the analyzer will parse.
The document type is "typeJapanese", and the Japanese analyzer will parse the "body" field.
curl -XPUT 'http://localhost:9200/rbl-test-jpn/typeJapanese/_mapping' -H'Content-Type: application/json' -d '{
"typeJapanese": {
"properties" : {
"body" : { "type" : "text", "analyzer" : "my_rbl" }
}
}
}'
Add documents to the index.
Each document contains a "body" field with Japanese text. The last statement refreshes the index so the documents will be available for queries.
curl -XPUT 'http://localhost:9200/rbl-test-jpn/typeJapanese/1' -H'Content-Type: application/json' -d '{
"body": "が1分0秒40の日本人として大会初の銅メダルを獲得した。"}'
curl -XPUT 'http://localhost:9200/rbl-test-jpn/typeJapanese/2' -H'Content-Type: application/json' -d '{
"body": "優勝は59秒44の世界新をマークしたアメリカ人のナタリー・コーグリン。"}'
curl -XPUT 'http://localhost:9200/rbl-test-jpn/typeJapanese/3' -H'Content-Type: application/json' -d '{
"body": "柴田亜衣(チームアリーナ)が15分58秒55をマークし、2日続けて同種目の日本記録を更新し銅メダル。"}'
curl -XPUT 'http://localhost:9200/rbl-test-jpn/typeJapanese/4' -H'Content-Type: application/json' -d '{
"body": "男子百メートル背泳ぎ決勝は、アーロン・ピアソル(米)が52秒98の世界新で優勝し、森田智己(セントラルスポーツ)は8位。"}'
curl -XPOST 'http://localhost:9200/rbl-test-jpn/_refresh'
Run queries against the index.
For each query, Elasticsearch returns the number of hits and the document id for each hit.
curl -s 'http://localhost:9200/rbl-test-jpn/typeJapanese/_search?pretty=true' -H'Content-Type: application/json'
-d '{"query": {"query_string": {"query": "body:大会"}}, "stored_fields":[ "_id"]}'
curl -s 'http://localhost:9200/rbl-test-jpn/typeJapanese/_search?pretty=true' -H'Content-Type: application/json'
-d '{"query": { "query_string": {"query": "body:ナタリー"}}, "stored_fields":["_id"]}'
curl -s 'http://localhost:9200/rbl-test-jpn/typeJapanese/_search?pretty=true' -H'Content-Type: application/json'
-d '{"query": {"query_string": {"query": "body:柴田亜衣"}}, "stored_fields":["_id"]}'
curl -s 'http://localhost:9200/rbl-test-jpn/typeJapanese/_search?pretty=true' -H'Content-Type: application/json'
-d '{"query": {"query_string": {"query": "body:アーロン"}}, "stored_fields":["_id"]}'
curl -XDELETE 'http://localhost:9200/rbl-test-jpn'