RNI-RNT は、Rosette名称照合モジュールとRosette名称翻訳モジュールからなる製品です。各種文字体系で書かれた名称に言語解析処理を施し、照合・翻訳するためのAPIを提供するSDKプラットフォームです。
[en] RNI-RNT is a 100% Java implementation of RNI-RNT, a Software Development Kit (SDK) containing two Basis Technology Products: Rosette Name Indexer (RNI) and Rosette Name Translator (RNT). These products provide the linguistic infrastructure and APIs to perform name searches and translations across an expanding collection of languages and writing scripts.
RNI-RNTのJava APIは、複数の言語の名称に対し、インデックス処理、インデックスのクエリー、名称の照合、自動および逐次翻訳などを行うことができます。
重要
RNI-RNTは、同一バージョンのRNI(名称照合)でインデックスしたデータを扱うことができます。バージョンアップした際は、RNIデータをインデックスし直して下さい。
RNI-RNT は以下のプラットフォームに対応しています。プラットフォーム名は、OS、CPU、C++コンパイラー[1]バージョンの記述を含んでいます。Javaは1.8以降に対応しています。
表1 [en] Supported Platforms
[en] OS |
[en] CPU |
[en] Compiler |
[en] $BT_BUILD |
[en] MAC OS X v10.9+ (Darwin 13) |
[en] AMD64 |
[en] xcode 5 |
[en] amd64-darwin13-xcode5 |
[en] Linux |
[en] AMD64 |
[en] gcc 4.4 |
[en] amd64-glibc217-gcc48 |
[en] Linux |
[en] AARCH64 |
[en] gcc 7.3 |
[en] aarch64-glibc226-gcc73 |
[en] Windows |
[en] AMD64 |
[en] Visual Studio 2013 |
[en] amd64-w64-msvc120 |
[en] Java Only |
[en] n/a |
[en] n/a |
[en] jvm |
[en] Release 7.41.0.c69.0
[en] March 2023
[en] New
[en] Turkish support added: Added support for Turkish-English and Turkish-Turkish name matching. We have also added person and organization overrides, stopwords, and language detection to improve matching in Turkish. (RLPNC-6499)
[en] Improved person name matching: RNI-RNT now has the ability to detect given names and surnames in Latin script when the name is of English origin. When the enableAdditionalOnomastics
parameter is true, gender mismatch penalty is only applied to the detected given name, as opposed to the first name token in a query. (RLPNC-6719, RLPNC-6720)
[en] Improved Arabic person name matching: The new TRAILING_PATRONYMIC_DELETION match phenomenon provides improved scores for matches which contain a deletion that is caused by truncation of a patronymic. The score of this deletion is controlled by the trailingPatronymicDeletionScore
parameter. This only applies to Latin script names of Arabic origin when enableAdditionalOnomastics
is true. (RLPNC-6756)
-
[en] New date parameters: New parameters have been added for adjusting scores for dates that may contain a single digit manipulation. A digit manipulation is a transformation to a digit that can be accomplished with minimal additional lines (for example, changing a 1 to a 7). The new improveSingleDigitManipulationMatch
parameter controls how much the score is increased for dates containing a single digit manipulation. The maxYearDistanceForDigitManipulation
allows you to sets the maximum number of years beyond which two dates will not be affected by the former parameter. (RLPNC-6677)
[en] The new thresholdToDropoffBiasMapping
parameter allows you set score dropoffs when matching dates that are a certain number of years apart. You can specify the number of years beyond which there should be a score dropoff and how large the dropoff should be. Multiple dropoff points can be set with this parameter. (RLPNC-6810)
[en] Russian support migrated from C++ to Java: Russian translation services are now part of jvm-only. Translations from Russian to English may differ slightly. (RLPNC-6764)
[en] Bug Fixes
[en] Release 7.40.1.c68.0
[en] December 2022
[en] Bug Fixes
[en] Release 7.40.0.c68.0
[en] December 2022
[en] New
[en] Solr 9 support: Solr 9 is now supported. (RLPNC-6476)
[en] Improved Japanese organization matching: カンパニー (company) added to the Japanese organization stopwords list. (RLPNC-6545)
[en] Improved date matching: Invalid dates are now rejected. For example, April 31, 2021 will now be rejected. (RLPNC-6610)
[en] Improved Chinese address matching: We've expanded the list of Chinese stop words for addresses. (RLPNC-6587)
[en] Improved Chinese organization matching: We've expanded the list of Chinese stop words for organizations. (RLPNC-6615)
[en] Improved name matching results: When no entityType
is specified, the type PERSON
will be applied. Previously, the type NONE
was applied. (RLPNC-6576)
[en] New parameter for token overrides: We've added a new parameter, overrideSelector
, to control which overrides will be considered during querying and matching. Override filenames can now specify a “selector” value which will be matched against this parameter. (RLPNC-6561)
[en] Improved documentation: Tables listing all parameters and match phenomena have been added to the Application Developer's Guide. (RLPNC-6575)
[en] New JSON-formatted explain info block: We've added a new JSON-formatted explain info block providing detail about the match logic for any given pairwise match. It is executed at run time by passing in the new boolean parameter jsonExplainInfo
. When set to true, RNI returns the JSON-formatted explain info block. There is a small performance impact to this function; so it is not recommended to run on all queries, but it is a great diagnostic and reporting tool. (RLPNC-6574)
[en] Bug Fixes
[en] Horizontal tabs are now removed as part of normalization in English. (RLPNC-6541)
[en] Control characters are now removed from Arabic names before matching. (RLPNC-6543)
[en] Fixed a case where unexpected name inputs could lead to a null pointer exception. (RLPNC-6634)
[en] Fixed an issue where date parsing of valid inputs could result in an exception. (RLPNC-6600)
[en] Fixed an issue where RNI could return match scores greater than 1. (RLPNC-6595)
[en] Third-party component updates
表2 [en] Upgraded
[en] Package |
[en] Old version |
[en] New version |
[en] Apache Log4j |
2.17.1 |
2.19.0 |
[en] commons-compress |
1.21 |
1.22 |
[en] fastutil |
8.5.6 |
8.5.9 |
[en] Jackson |
2.11.1 |
2.14.0 |
[en] JavaCPP |
[en] 1.58-alpha.20220614.013710.426 |
1.58 |
[en] SLF4J |
1.7.33 |
1.7.36 |
[en] SnakeYAML |
1.30 |
1.33 |
[en] Release 7.39.0.c67.0
[en] September 2022
[en] New
-
[en] Improved matching of Han character names: We've added a parameter, haniFourCornerCodeMismatchPenalty
to add a penalty for names with different four-corner scores. By default, this features is disabled; the parameter is set to 0. To enable this feature, in your parameter_profiles.yam file, set: (RLPNC-6428)
zho_zho_PERSON:
haniFourCornerCodeMismatchPenalty: 1
注記
[en] This is an experimental feature. As with any experimental feature, we highly recommend experimenting in your environment with your data.
[en] Improved accuracy for Chinese and Japanese organization names: We've improved the embedding match scores for Chinese and Japanese by enhancing the Chinese and Japanese embedding dictionaries. (RLPNC-6420, RLPNC-6421)
[en] Improved Thai, Burmese, and Khmer organization name matching: We've improved real world ID matching for Thai, Burmese, and Khmer. (RLPNC-6337)
-
[en] Cantonese name segmentation: When enableYueReadings
is set to true
, Yue Chinese readings are now segmented.(RLPNC-6369)
[en] Example: 葉國謙 vs. Ip Kwok-him
[en] Previously: 0.41
[en] Now: 0.71
[en] Bug Fixes
[en] Fixed a bug where multiple concurrent requests specifying dynamic parameters could cause incorrect match score results (RLPNC-6475)
[en] Creating a DateSpec
with just a month no longer throws an ArrayIndexOutOfBoundsException
. (RLPNC-6456)
[en] Deprecation Notification
[en] The following classes and functions are now deprecated and will be removed in an upcoming release, no earlier than September 2023.
-
[en] Lookup classes and methods for name indexes. All lookup methods and classes have been replaced with equivalent query versions.
-
[en] Classes:
-
[en] Methods:
[en] com.basistech.rni.index.INameIndex#lookup
[en] com.basistech.rni.index.AbstractNameIndex#lookup
[en] com.basistech.rni.index.INameIndexSession#lookup
[en] com.basistech.rni.index.StandardNameIndex#lookupInList
-
[en] Objects for creating a name and changing its data. These have been replaced with the NameBuilder class.
-
[en] Constructors:
-
[en] Methods:
[en] com.basistech.rni.match.Name#getMaximumNumTokens
[en] com.basistech.rni.match.Name#setMaximumNumTokens
[en] com.basistech.rni.match.Name#setLanguage
[en] com.basistech.rni.match.Name#setScript
[en] com.basistech.rni.match.Name#setFieldedData
-
[en] getComparator method.
-
[en] Methods
[en] com.basistech.rni.index.INameIndexFilter#getComparator
[en] com.basistech.rni.index.internal.OracleNameIndexFilter#getComparator
[en] com.basistech.rni.index.StandardNameIndexFilter#getComparator
-
[en] Flags.
[en] Previously, the two flags controlled by these methods could vary based on your setup. Now, they are always true, so there is no need to check them or change them.
-
[en] Methods
[en] com.basistech.rni.index.IndexStoreDataModelFlags#isStoringNamePrimary
[en] com.basistech.rni.index.IndexStoreDataModelFlags#setStoringNamePrimary
[en] com.basistech.rni.index.IndexStoreDataModelFlags#isStoringNameTransliterations
[en] com.basistech.rni.index.IndexStoreDataModelFlags#setStoringNameTransliterations
-
[en] Removed Constructors for Exceptions.
-
[en] Constructors:
[en] all versions of com.basistech.rni.index.UnsupportedNameDomainException#UnsupportedNameDomainException
[en] all versions of com.basistech.rni.match.UnsupportedDomainPairException#UnsupportedDomainPairException()
-
[en] Miscellaneous.
-
[en] Methods
-
[en] Constructors
[en] Release 7.38.1.c67.0
[en] August 2022
[en] New
[en] Neural model support: An open source package (JavaCPP) has been updated to allow the Elasticsearch plugin to use TensorFlow. (RLPNC-6336)
[en] Complete CJK Ext A support: We now have full support of CJK Unified Ideographs Extension A. (RLPNC-6324)
[en] Improved Spanish name matching: We have improved Spanish surname detection. (RLPNC-6294)
-
[en] Improved Japanese location matching: All prefectures of Japan are now included in the override list. (RLPNC-6326)
[en] Example: 北海道 vs. Hokkaido Prefecture
[en] Previously: 0.7246
[en] Now: 0.99
[en] Bug Fixes
[en] All date matches now return explain information. (RLPNC-6318)
[en] The string "luiz arlos da silva bueno" is no longer in the Greek, English, and Vietnamese stop word lists. (RLPNC-6358)
[en] The parameter reRankWeight
is now ignored when reRankMode
is set to replace when using Solr. (RLPNC-6279)
[en] Yue Chinese is no longer a language option in the RNI web services client. (RLPNC-6393)
[en] Release 7.38.0.c67.0
[en] June 2022
[en] New
-
[en] Improved matching of Spanish names and names of Spanish origin: RNI now has a deeper understanding of Spanish surnames. For example: "JOSE JORGE RIOS TORRES" now gets a higher score when matched against "JOSE RIOS" than it does when matched against "JOSE TORRES", since "RIOS" is recognized as the primary surname. (RLPNC-6037)
[en] The following parameters impact how Spanish names are matched:
[en] The new Boolean parameter enableAdditionalOnomastics
controls whether to assign a TokenType
to allow for multiple Spanish surnames. When set to true
, each token is assigned a TokenType
, where the TokenType
is one of: UNKNOWN
, SURNAME
, or SURNAME2
. It is currently set to true
for the spa_eng_PERSON
, spa_spa_PERSON
and eng:spa_eng_PERSON
profiles.
[en] The preexisting parameter surnameTokenTypeWeight
now applies only to the TokenType.SURNAME
tokens. Its default value was changed from 1 to 1.2.
[en] The new parameter secondarySurnameTokenTypeWeight
applies to TokenType.SURNAME2
tokens. Its default value is 0.6.
-
[en] The new parameter crossSurnameMatchPenalty
parameter is applied (by simple multiplication) when a TokenType.SURNAME
token is scored against a TokenType.SURNAME2
token. Its default value is 0.75.
[en] Example: Pablo Emilio Escobar Gaviria vs. Pablo Escobar
[en] Previously: 0.7945
[en] Now: 0.8309
[en] Example: Pablo Emilio Escobar Gaviria vs. Emilio Gaviria
[en] Previously: 0.7999
[en] Now: 0.7365
-
[en] Improved matching of English organization names: We've added ordinal numbers to the override list for English organizations. (RLPNC-6225)
[en] Example: 1st National Bank vs. First National Bank
[en] Previously: 0.6470
[en] Now: 0.9257
[en] Cantonese support added: RNT can now transliterate Han into Latin characters using the Jyutping transliteration scheme for Cantonese. (RLPNC-6232)
[en] Custom real world id support: A real world identifier associates company names, along with their associated nicknames and permutations, with an identifier. This makes it possible to match different names for an organization which have no phonetic similarity (for example, IBM vs. Big Blue). RNI is shipped with a file of real world ids. You can now create your own file of organizations with all versions of their names and real world ids. (RLPNC 6040, RLPNC-6041)
-
[en] Improved Vietnamese name matching: We've expanded the Vietnamese stop word lists for PERSON and ORGANIZATION entity types. (RLPNC-5694)
[en] Example: Chủ tịch Hồ Chí Minh vs. Hồ Chí Minh (translation: President Ho Chi Minh)
[en] Previously: 0.83
[en] Now: 0.99
[en] New parameter for non-phonetic matches: We've added the parameter editDistanceScoreBias
to adjust the bias for edit distance scores. Increasing the impact of edit distance scores can improve the match scores of typographical errors and other non-phonetic matches. (RLPNC-6199)
[en] New parameter for organization names: We've added the parameter tokenizeOrganizationsWithNumbers
that prevents tokenization of names with numbers within the name. When set to true
(default), the number is left within the token and the name will get a higher value from the edit distance scorer. This is desirable if your data contains organization names which intersperse alphabetic and numeric characters or if your data often contains typographical errors with numerals inserted into otherwise valid tokens. When set to false
, the number remains within the organization name token. (RLPNC-6200)
[en] Support for Cantonese name transliterations: We've added Jyutping transliterations (Cantonese pronunciation of Chinese names) to the list of readings. The new parameter enableYueReadings
enables Jyutping readings. It is set to false
by default. To enable Jyupting readings, set enableYueReadings
to true
. (RLPNC-6239)
-
[en] Improved Japanese-English location name matching: We've expanded the Japanese-English overrides list for location names. (RLPNC-6268)
[en] Example: 大阪府 vs. Osaka Prefecture
[en] Previously: 0.5853
[en] Now: 0.99
[en] New parameter to improve performance: We've added the parameter enableCompletedDataTermFiltering
which when set to false
will exclude part of the first pass query. This results in a large performance improvement, but may impact accuracy as some potential results may not be passed to the second pass query. The accuracy impact is small in Latin-Latin matches, but has a much larger impact in other scripts, such as Chinese. The default value is true
, which is the previous behavior. (RLPNC-6133).
[en] Java 17 support added: Java 8 and 9 support has been removed. (RLPNC-6171)
[en] Solr 6 support deprecated: RNI-RNT no longer supports Solr 6 or earlier (RLPNC-6213)
[en] Bug Fixes
[en] More consistent matching scores are now returned from RNI-RNT when using Lucene. To resolve an issue with later versions of Lucene, the internal version of Lucene has been downgraded to 7.6.0 from 8.11.1. (RLPNC-6227)
[en] The frequencyModelTrainer
now runs without errors. (RLPNC-6220)
[en] RNI-RNT will no longer return match scores above 1.0. (RLPNC-6254)
[en] Known Issues
[en] Release 7.37.0.c66.0
[en] March 2022
注記
[en] Solr 6 and earlier support is deprecated as of this release.
[en] Java 8 and Java 9 support is deprecated as of this release.
[en] New
-
[en] Khmer support
[en] We now support Khmer - Khmer and Khmer - English name matching. (RLPNC-5712)
[en] Khmer stop word lists are included for person and organization types. (RLPNC-5715)
[en] We now support Khmer - English name translation. (RLPNC-5708)
[en] Improved language detection: We've improved language detection for languages that use Han characters (Chinese, Japanese, Korean). (RLPNC-6059)
[en] Improved ORG matching: We've expanded the list of known organization names in our real world ID tables to improve ORG matching in Arabic (ara), Burmese (mya), Chinese (zho), French (fra), German (deu), Greek (ell), Hebrew (heb), Hungarian (hun), Italian (ita), Japanese (jpn), Korean (kor), Portuguese (por), Russian (rus), Spanish (spa), Thai (tha), and Vietnamese (vie). (RLPNC-6090)
[en] Improved Chinese - English address matching: We expanded overrides for ethnic minority regions, particularly from Xinjiang, Tibet, and Inner Mongolia. (RLPNC-6077)
[en] Improved time interval date matching: The timeProximityYearInterval
parameter now allows any integer interval value. Previously, it would round the increments up to a 10 year interval. (RLPNC-6060)
[en] New parameter boostWeightAtLeftEnd
: We added a new parameter boostWeightAtLeftEnd
to increase the weighting of the first token in a name. When setting this parameter, the boostWeightAtRightEnd
parameter should not be modified. (RLPNC-6094)
[en] Improved Chinese - English ORG matching: We added override mappings for Chinese numerals in Hanzi to Arabic numbers from zero through twenty-one. (RLPNC-6028)
[en] Bug Fixes
[en] Pairwise match now works with all languages that have limited language support. Previously, an error was returned for unidentified languages. (RLPNC-6100)
[en] Java-only distributions now contain the model files for Thai, Hungarian, and Greek. (RLPNC-6111)
[en] Third-party component updates
[en] This release includes the following third-party component changes:
表3 [en] Updated Components
[en] Component |
[en] Old Version |
[en] New Version |
[en] Lucene |
7.6.0 |
8.11.1 |
[en] Apache Commons IO |
2.7 |
2.11.0 |
[en] ICU4J |
59.1 |
70.1 |
[en] fastutil |
8.4.0 |
8.5.6 |
[en] SLF4J |
1.7.28 |
1.7.33 |
[en] SnakeYAML |
1.26 |
1.30 |
[en] commons-lang |
[en] 2.6 and 3.10.0 |
3.12.0 |
表4 [en] Removed Components
[en] Component |
[en] Version |
[en] slf4j-log4j |
1.7.28 |
[en] Release 7.36.1.c65.0
[en] January 2022
[en] New
[en] Improved address searching: The set of address match results returned are now consistent. (RLPNC-6057)
[en] Neural model for Katakana: To enable the neural-based phonetic name matching model, set enableSeq2SeqTokenScorer
to true in the jpn_eng
profile in the parameters_profiles.yaml
file. Previously, it was set in the internal_param_defs.yaml
file. (RLPNC-6068)
[en] log4j update: Updated log4j to the 2.17.1 release. (RLPNC-6071)
[en] Bug Fixes
[en] Third-party component updates
[en] This release includes the following third-party component changes:
表5 [en] Updated Components
[en] Component |
[en] New Version |
[en] log4j |
2.17.1 |
[en] Release 7.36.0.c65.0
[en] November 2021
重要
[en] If you have any customizations for address stop words or overrides from previous releases, the file names must be renamed to the new file naming convention. The file names now include three letter language codes.
[en] New
[en] Chinese address matching: We now support Chinese-Chinese and Chinese-English address matching. (RLPNC-5822)
-
[en] Language-specific address override files: Address override files are now language-specific, and the file name must include the language codes. (RLPNC-6032)
[en] Example: English-English state overrides
[en] Example: Chinese-English city overrides
-
[en] Language-specific address stop word files: Stop word files for address matching on text fields (house, road, city, state, country) are now language-specific, and the file name must include the language code (either eng
or zho
). (RLPNC-6031)
[en] Example: English city stop pattern
[en] Basic support for all languages: RNI can now index and match names in any language. Languages which previously would have returned an "unsupported language" error now return a match score. The score is either 1 for a perfect match, or a value based on edit distance. Set the parameter allLanguageSupport
to false
for backwards compatible behavior to previous versions. (RLPNC-5979)
[en] New parameter to improve recall for ORG matching: You can now improve recall in RNI's first-pass when using real world IDs by increasing the value of the parameter nameRealWorldIdQueryBoost
. (RLPNC-5938)
-
[en] Improved ORG matching: We added real world ID tables or organizational names to improve ORG matching in the following languages: Thai (tha), Greek (ell), Hebrew (heb), Burmese (mya), German (deu), French (fra), Hungarian (hun), Italian (ita), Portuguese (por), Spanish (spa), and Vietnamese (vie). (RLPNC-5986)
[en] Example: "International Astronomical Union" vs. "האיגוד האסטרונומי הבינלאומי"
[en] Neural model for Katakana: We've added a neural-based phonetic matching model to improve Katakana-Latin name matching. To enable the model, set enableSeq2SeqTokenScorer
to true in internal_param_defs.yaml
file. (RLPNC-5945)
[en] Improved Hebrew name matching: We've added a rule-based vocalization checker for the statistical-model vocalizer to improve Hebrew-Hebrew and Hebrew-English name matching. (RLPNC-5990)
-
[en] Time-distance capable date matching parameter: We have added an alternative date matching solution that aims to make the definition of closeness for dates more flexible and adjustable. This new algorithm computes the chronological distance between dates in years and uses a timeProximityYearInterval
parameter to determine matching candidates and apply an appropriate score penalty. To enable this feature, set alternativeTimeProximityMatch
to true
. (RLPNC-5948)
[en] Example: 01/06/1982" vs. "31/12/1980" [timeProximityYearInterval
= 10 years]
[en] Bug Fixes
[en] Hebrew Folk transliteration has been improved, especially for the letters vav and yod. (RLPNC-5916)
[en] The tokenization of names written in the Burmese script has been improved. This change applies only to RNI and mainly affects names of non-Burmese origin. (RLPNC-5957)
[en] Burmese-English transliteration has been improved by revising the Folk and MLCTS transliteration schemes. (RLPNC-5950)
[en] Third-party component updates
[en] This release includes the following third-party component changes:
表6 [en] Updated Components
[en] Component |
[en] Old Version |
[en] New Version |
[en] OSHI Core |
3.4.2 |
3.4.4 |
[en] Java Native Access Platform |
4.3.0 |
4.5.0 |
[en] Java Native Access |
4.3.0 |
4.5.0 |
[en] ThreeTen Backport |
1.3.3 |
1.3.6 |
[en] JavaCPP |
1.5.3 |
1.5.4 |
表7 [en] Added Components
[en] Component |
[en] Version |
[en] Tensorflow core API |
0.2.0 |
[en] Protobuf java |
3.12.2 |
[en] Tensorflow NdArray |
0.2.0 |
表8 [en] Removed Components
[en] Component |
[en] Version |
[en] DeepLearning4j Core |
[en] 1.0.0-beta7 |
[en] DeepLearning4j TSNE |
[en] 1.0.0-beta7 |
[en] Nearestneighbor Core |
[en] 1.0.0-beta7 |
[en] DeepLearning4j Datasets |
[en] 1.0.0-beta7 |
[en] DeepLearning4j Common |
[en] 1.0.0-beta7 |
[en] DeepLearning4j DataVec Iterators |
[en] 1.0.0-beta7 |
[en] DeepLearning4j Modelimport |
[en] 1.0.0-beta7 |
[en] JavaCPP Presets Platform For HDF5 |
1.12.0-1.5.3 |
[en] DeepLearning4j NN |
[en] 1.0.0-beta7 |
[en] DeepLearning4j Utility Iterators |
[en] 1.0.0-beta7 |
[en] Concurrent |
1.3.4 |
[en] ND4J Common |
[en] 1.0.0-beta7 |
[en] ND4J Guava |
[en] 1.0.0-beta7 |
[en] ND4J Protobuf |
[en] 1.0.0-beta7 |
[en] ND4J Jackson |
[en] 1.0.0-beta7 |
[en] OSHI JSON |
3.4.2 |
[en] Apache Commons Math |
3.5 |
[en] Apache Commons Compress |
1.18.0 |
[en] ND4J API |
[en] 1.0.0-beta7 |
[en] Byte Units |
0.9.4 |
[en] FlatBuffers Java API |
1.10.0 |
[en] Gson |
2.8 |
[en] Apache Commons Net |
3.1 |
[en] ND4J Protobuf |
[en] 1.0.0-beta7 |
[en] Neoitertools |
1.0.0 |
[en] DataVec API |
[en] 1.0.0-beta7 |
[en] Apache FreeMarker |
2.3.23 |
[en] Stream Library |
2.9.8 |
[en] OpenCSV |
2.3 |
[en] T Digest |
3.2 |
[en] ND4J Native |
[en] 1.0.0-beta7 |
[en] JavaCPP Presets For OpenBLAS |
0.3.9-1-1.5.3 |
[en] JavaCPP Presets For MKL |
2020.1-1.5.3 |
[en] ND4J Native API |
[en] 1.0.0-beta7 |
[en] Release 7.35.2.c65.0
[en] September 2021
[en] Bug Fixes
[en] Third-party component updates
[en] No changes to third-party components.
[en] Release 7.35.1.c65.0
[en] September 2021
[en] New
[en] ARM64 support: We now support ARM64 processors. (RLPNC-5912)
[en] Burmese transliteration: We added a Basis Technology-created Folk transliteration scheme for Burmese name matching that is similar to how Burmese names are commonly transliterated to English. (RLPNC-5892)
-
[en] Improved address matching: We've modified the field weight values to provide more accurate address match scores. Weightings were determined by evaluating US and UK address data. (RLPNC-5893)
[en] Example: "85 Court Road Newton Ferrers, Plymouth PL8 1DE1B Devon, England UK” vs “85 Court Road Newton Ferrers PL8 1DE UK"
[en] Improved address overrides: Address overrides are now applied to groups of related address fields, instead of just individual fields. Overrides apply when matching any two fields from the same group. (RLPNC-5899)
[en] New date parameters: We've added a new parameter, dateOrdering
, which sets the default date representation. It must be one of three valid values, YMD
, DMY
, or MDY
. The default value is MDY
. (RLPNC-5904)
-
[en] Improved Hebrew transliteration: The Hebrew character ח used to be transliterated as “h” in some cases and “kh” in others (if it was followed by a geresh). It is now transliterated as “ch"when not followed by a geresh. The Hebrew character כ used to be transliterated as “h” in some cases and “k” in others (if it has a dagesh). Now, it is transliterated to “ch” in the cases when it used to be transliterated to “h”. (RLPNC-5928)
[en] Example: נחמן
[en] Previously: Nahman
[en] Now: Nachman
[en] Example: מיכל
[en] Previously: Mihal
[en] Now: Michal
[en] Bug fixes
-
[en] EntityTypes in query: Queries now filter by entity type. Note that indexed names without a specified entity type will only match query names that also don't specify an entity type. (RLPNC-5896)
[en] Example: Create an index with one document: “RIDGEWAY JOHN” as PERSON. Query the index with “Ridgeway School” as ORGANIZATION in Solr.
[en] Release 7.35.0.c65.0
[en] August 2021
[en] New
-
[en] Improved Hebrew-English name matching:
[en] We've improved the statistical model. (RLPNC-5842)
-
[en] We changed the default transliteration scheme to FOLK from ISO259-2-1994, which improves matching scores as FOLK more closely matches how people transliterate Hebrew names. (RLPNC-5844)
[en] Example: בִּנְיָמִין גַּנְץ vs. Benjamin Gantz
-
[en] We expanded the token overrides for person entity types. (RLPNC-5845)
[en] Example: אלכס vs. Alexander
-
[en] We added word embeddings for Hebrew organizations. (RLPNC-5837)
[en] Example: ארגון המזון והחקלאות vs. Food and Agriculture Organization
-
[en] Improved Hebrew-Hebrew name matching: We expanded the token overrides for person entity types. (RLPNC-5891)
[en] Example: סולומונתאס vs. סולונאס
-
[en] Improved English-English name matching: We added the token override pair Alex/Aleksandar. (RLPNC-5871)
[en] Example: Alex vs. Aleksandar
-
[en] Improved matching for identifiers: We improved matching and added support for three new subtypes: IDENTIFIER_DRIVERS_LICENSE, IDENTIFIER_LICENSE_PLATE, IDENTIFIER_NATIONAL_ID_NUM, along with IDENTIFIER_GENERIC. (RLPNC-5852)
[en] Example: NH123456789DL vs. NH123456789DN (as IDENTIFIER_DRIVERS_LICENSE entity type)
-
[en] Improved Japanese Segmentation: We've expanded the segmentation dictionary to improve Japanese name segmentation. (RLPNC-5835)
[en] Example: ミロシェヴィッチスロボダン
-
[en] Improved address matching: We've expanded the override tables for UK, U.S., and Canadian addresses. (RLPNC-5886)
[en] Example: houseNumber<47>road<Albert Street>city<Aberdeen>stateDistrict<Aberdeenshire>postCode<AB25 1XT> vs. houseNumber<47>road<Albert Street>city<Aberdeen>stateDistrict<ABD>postCode<AB25 1XT>
[en] New API endpoint: We added com.basistech.names.parameters.ParameterProfileUtils displayParameterUniverses
to list all named parameter universes registered in the system. (RLPNC-5851)
[en] Bug Fixes
-
[en] Overrides for alphanumeric address fields (houseNumber, unit, poBox, postCode) are now being applied. (RLPNC-5863)
[en] Example: “3710 W Martin Luther King Blvd STE #121” vs. “3710 W Martin Luther King Blvd Suite #121”
-
[en] Hebrew tokens containing diacritics are now identified in the override table. (RLPNC-5882)
[en] Example: אֲבִי vs. Abigail
[en] Release 7.34.0.c64.1
[en] May 2021
[en] New
-
[en] Added support for Burmese-English name translation. (RLPNC-5662)
[en] Example: မင်း အောင် လှိုင် ⟹ Maang Aaung Lhuing
[en] Added support for Burmese-Burmese and Burmese-English name matching. (RLPNC-5660)
[en] Added support for Hebrew-Hebrew and Hebrew-English name matching. (RLPNC-5339)
[en] Added support for Vietnamese-Vietnamese and Vietnamese-English name matching. (RLPNC-5687)
-
[en] Improved address matching by improving handling of postal codes. (RLPNC-5639)
[en] Example: houseNumber<123>road<Clifton St>city<Cambridge>state<MA>postCode<02140 1234> vs. houseNumber<123>road<Clifton St>city<Cambridge>state<MA>postCode<02140-1234>
-
[en] Improved address matching by expanding override tables for UK and CA addresses. (RLPNC-5607)
[en] Example: houseNumber<100>road<Main Ave>city<Shellbrook>state<Saskatchewan>postCode<S0J 2E0> vs houseNumber<100>road<Main Ave>city<Shellbrook>state<Sask>postCode<S0J 2E0>
-
[en] Improved Chinese-English name matching by allowing English translations from the list of translations of the Chinese name to be considered when matching a name pair. (RLPNC-5643)
[en] Example: 汤姆 vs. Tom
-
[en] Improved English-English name matching for ORGANIZATION entity type by expanding the overrides list with numbers and their written from 1 to 21. (RLPNC-5644)
[en] Example: Channel One Russia vs. Channel 1 Russia
[en] Improved name matching for ORGANIZATION entity type by adding new frequency models in English, Chinese, Arabic, Japanese and Russian. (RLPNC-5416)
[en] Improved name matching for ORGANIZATION entity type by adding new frequency models in English. (RLPNC-5416)
[en] Updated the English frequency model for PERSON entity type by adding birth names from 1920-2019 to the existing model. (RLPNC-5592)
-
[en] Improved date matching by expanding parsing support for more date formats. (RLPNC-5585)
[en] Example: 2000-12-99 vs. 2000-12-DD
[en] Improved name deduplication by adding support for name overrides so that for example nickname “Mike” is included in the same cluster with “Michael”. (RLPNC-5600)
[en] Upgraded the RNI Solr plugin to support Solr 8.8.1 (RLPNC-5652)
[en] Improved Solr query performance for multi-valued RNI address and name fields by adding support for promising term filtering. Promising term filtering uses knowledge of document frequency at query time to prevent slow queries due to common terms. (RLPNC-5649)
[en] Improved Solr query performance by disabling phrase queries. To enable phrase queries for multi valued fields, set useSolrPhraseQueries
parameter to true
. (RLPNC-5789)
[en] Bug Fixes
[en] Third-Party component updates
表9 [en] Updated
[en] Component |
[en] Old Version |
[en] New Version |
[en] License |
[en] Apache Lucene Solr |
8.5.1 |
8.8.1 |
[en] Apache |
[en] Apache Lucene Core |
8.5.1 |
8.8.1 |
[en] Apache |
[en] Apache Commons Lang |
3.9.1 |
3.10.0 |
[en] Apache |
[en] Apache Zookeeper |
3.5.5 |
3.6.2 |
[en] Apache |
[en] Jetty HTTP2 Client |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Jetty HTTP2 Common |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Jetty HTTP2 HTTP Client Transport |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Jetty Asynchronous HTTP Client |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Jetty Http Utility |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Jetty Utilities |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Jetty IO Utility |
[en] 9.4.24.v20191120 |
[en] 9.4.34.v20201102 |
[en] Apache, EPL |
[en] Metrics Integration with JMX |
4.1.2 |
4.1.5 |
[en] Apache |
表10 [en] Deleted
[en] Component |
[en] Version |
[en] Apache Commons FileUpload |
1.3.3 |
[en] Restlet |
2.4.0 |
[en] Release 7.33.1.c63.0
[en] January 2021
[en] New Features
[en] Added support for address matching in the RNI Solr plugin 6.6, 7.6 and 8.5. (RLPNC-5264)
[en] Added support for date matching in the RNI Solr plugin 7.6 and 8.5[2]. (RLPNC-5480)
-
[en] Improved Japanese-English and Japanese-Japanese name matching by expanding the Japanese stop word list for ORGANIZATION entity type. (RLPNC-5466)
[en] Example: コダック合同会社 vs. Kodak Limited
-
[en] Improved Russian-English and Russian-Russian name matching by expanding the Russian stop word list for ORGANIZATION entity type. (RLPNC-5467)
[en] Example: Балтийский федеральный университет имени Иммануила Канта vs. Immanuel Kant Baltic Federal University
[en] Improved character normalization for all supported languages. (RLPNC-5514)
-
[en] Improved Russian-English and Russian-Russian name matching by adding support for Russian organizations in the entity resolution engine. (RLPNC-5564)
[en] Example: Ура́льские авиали́нии vs. Ural Airlines
-
[en] Improved Korean-English and Korean-Korean name matching by adding support for Korean organizations in the entity resolution engine. (RLPNC-5564)
[en] Example: 현대자동차 vs. Hyundai Motor Company
-
[en] Improved date matching by adding support for dates in yy/mm/dd format. (RLPNC-5562)
[en] Example: 76/01/22 vs 01/22/1976
[en] Release 7.33.0.c62.2
[en] New Features
-
[en] Improved the accuracy of name matching for ORGANIZATION entity type by integrating name completion with an internal entity resolution engine. Currently it has support for English, Arabic, Chinese and Japanese organizations. (RLPNC-5454)
[en] Example: ソニー株式会社 vs. Sony Corporation
[en] Added three new parameters: doQueryRealWorldIds
, useRealWorldIds
and realWorldIdScore
which allow to control the entity resolution engine integrated as part of the name completion process. doQueryRealWorldIds
allows you to disable (enabled by default) the query clauses that are looking to match real-world IDs. useRealWorldIds
can be set to false per-profile to disable matching real-world IDs for specific pairs of languages or entity types. realWorldIdScore
controls the match score awarded when two names match due to having matching real-world IDs. (RLPNC-5417)
[en] Added support for debug information when matching addresses. AddressMatchResult.getAddressFieldPairResults
now returns a list of AddressFieldPairResults that describe how each pair of address fields was scored. (RLPNC-5511)
-
[en] Improved segmentation of Japanese names for PERSON entity type. (RLPNC-5520)
[en] Example: スズキタロウ
-
[en] Expanded the Spanish-Spanish token overrides for PERSON entity type. (RLPNC-5539)
[en] Example: Francisco vs. Paco
[en] Release 7.32.3.c62.2
[en] New Features
[en] Added three new parameters: nameBigramQueryBoost
, nameDoubleMetaphoneQueryBoost
, and nameInitialQueryBoost
which allow you to tweak the weight of their respective query clause boosts in Lucene in order to improve first-pass recall. (RLPNC-5503)
[en] Bug Fixes
[en] Release 7.33.2.c63.0
[en] January 2021
[en] Bug Fixes
[en] Release 7.32.2.c62.2
[en] New Features
[en] Release 7.32.1.c62.2
[en] New Features
-
[en] Enhanced semantic matching of tokens in organization names through use of word embeddings in Spanish, Arabic and Korean. Note: This drastically increases the size of the SDK package. To reduce the size, the embeddings dictionaries in rlpnc/data/tvec/filtered-vectors can be removed as long as the corresponding language pairs in parameter_profiles.yaml have useEmbeddings
set to false. (RLPNC-5449)
[en] Upgraded the native libraries of the native Linux-compatible release of RNI. We are now using CentOS 7 to build these libraries, as CentOS 6 will reach EOL in November 2020. The new BT_BUILD value for the Linux package is amd64-glibc217-gcc48. (RLPNC-5453)
-
[en] Improved organization name matching by expanding the English stop word list for organization entity type. (RLPNC-5458)
[en] Example for English-English name matching: SUNY Canton vs. State University of New York at Canton
-
[en] Added a new parameter, nameGluedQueryBoost
, which allows you to adjust the boost on the query term that looks for an exact match of the normalized name with spaces removed. The default value for nameGluedQueryBoost
is 1.0. (RLPNC-5445)
[en] Example: in a 10-million name Lucene index which includes names "PAUL MARTINI" and "JOHN LARKIN", query for each name respectively with artificially "re-tokenized" names such as "PAU LMAR TIN I" and "J O H N LARKIN":
-
[en] Improved address matching by improving the normalization of the postal code address field. (RLPNC-4967)
[en] Example: road<71-75 Shelton street>city<Covent garden>postCode<WC2H9JQ> vs. road<71-75, SHELTON STREET>city<LONDON>postCode<WC2H 9JQ>
[en] Release 7.32.0.c62.2
[en] New Features
-
[en] Added support for Hebrew-English name translation. The default transliteration scheme is set to FOLK, additionally ISO259-2-1994 and ICU transliteration schemes are supported as well as a Hebrew-English statistical model intended for translating names of foreign origin. (RLPNC-5446, RLPNC-5340, RLPNC-5342, RLPNC-5432)
[en] Hebrew to English translation with FOLK transliteration scheme example: רוזלינד פרנקלין ⟹ Ruzlind Prenklin
[en] Hebrew to English translation with ISO259-2-1994 transliteration scheme example: רוזלינד פרנקלין ⟹ Rẇzliynd Prnqliyn
[en] Hebrew to English translation with ICU transliteration scheme example: רוזלינד פרנקלין ⟹ Rẇzĕliynĕd Pĕrĕnĕqĕliyn
[en] Hebrew to English translation with statistical model example: רוזלינד פרנקלין ⟹ Rosalind Franklin
-
[en] Added support for Hebrew vocalization via a dictionary lookup and statistical model. (RLPNC-5389, RLPNC-5390, RLPNC-5388)
-
[en] Improved Arabic-English and Arabic-Arabic name matching by expanding the stopword list for person and organization entity types. (RLPNC-5248)
[en] Example for PERSON entity type: محمد vs. نبي محمد
[en] Example for ORGANIZATION entity type: البنك الأهلي التجاري vs. ال البنك الأهلي التجاري
[en] Bug Fixes
-
[en] Fixed a bug where Arabic-English translation wasn't returning translations as it used to from the statistical model and therefore affecting Arabic-English name matching as well. (RLPNC-5413)
[en] Example: Blake Lively vs. بليك ليفلي, the Arabic name used to be transliterated as "blayk lifali", whereas now it's actually translated as "blake lively"
[en] Release 7.31.1.c62.2
[en] New Features
[en] Upgraded rws-names web services to support Apache Tomcat 8.5.55 (RLPNC-5403)
[en] Expanded the Latin gender model with French, German, Italian, Portuguese, and Spanish names, so it's able to detect the gender of more names from the mentioned origins. (RLPNC-5334)
[en] Release 7.31.0.c62.0
[en] New Features
[en] Upgraded the RNI Solr plugin to support Solr 8.5.1 (RLPNC-5291)
[en] Improved Arabic-English and Arabic-Arabic name matching by improving segmentation of Arabic names and adding an Arabic/English statistical model, gender identification, language model, edit distance scoring, as well as adding support for initials and initialisms. (RLPNC-5269, RLPNC-5244, RLPNC-5256, RLPNC-5366, RLPNC-5255, RLPNC-5254)
[en] Improved address matching by adding support for cross-field matching of addresses, multi-token overrides, normalization process of address fields and expanded address overrides. (RLPNC-4969, RLPNC-5383, RLPNC-5386, RLPNC-5378)
[en] Added support for parsing addresses using the jpostal[4] library. (RLPNC-4571)
[en] Expanded the Japanese-English token overrides for organizations. (RLPNC-5384)
[en] Bug Fixes
[en] Release 7.30.5.c62.2
[en] February 2020
[en] Bug Fixes
-
[en] Fixed a bug in address matching where the country field in the address was not being matched correctly to values in the index. Address matching queries using a country code now work correctly. (SUPPO-1459)
[en] Example: Create an index with a primary name, birth date, and 2 addresses. The only field in the address is country. The country values are "Spain" and "Mexico". Search for an address with the value of "Spain".
[en] Previously: No documents are returned.
[en] Now: One document is returned, with a match score of 1.0.
[en] Release 7.30.4.c62.2
[en] February 2020
[en] New Features
[en] Release 7.30.3.c62.2
[en] January 2020
[en] Bug Fixes
[en] Release 7.30.2.c62.2
[en] January 2020
[en] New Features
-
[en] For Chinese, added the ability to remove stop words from within parentheses from the name match. (RLPNC-5096)
[en] Added a parameter enableDynamicConfigurationEndpoints
to control the dynamic configuration endpoints in the RNI Elasticsearch plugin. They are disabled by default. Set the parameter to true
in the parameter_profiles.yaml
file, in the any:
profile to turn on the endpoints. This may slow your system down considerably. (RLPNC-5225)
[en] Release 7.30.1.c62.0
[en] December 2019
[en] New Features
[en] Bug Fixes
[en] Release 7.30.0.c62.0
[en] November 2019
[en] New Features
[en] We've added a neural based phonetic matching model to improve Katakana-Latin name matching. To enable the model, set enableSeq2SeqTokenScorer
to true
for the jpn_eng
profile. (RLPNC-5148)
[en] Address matching can now split apart or join tokens as necessary to determine a match score. For example, previously it would get a lower score trying to match Old Colony Avenue to OldColony Avenue. Now it can recognize they are the same name. (RLPNC-5005)
[en] Improved address matching by adding more address overrides specifying abbreviations from English-speaking countries. For example, with an override, the token crossing in an address will match cross, court will match crt. (RLPNC-5201)
[en] Improved Japanese-English matching for organizations by adding more Japanese organizations to the override list. (RLPNC-5199)
[en] Added the ability to turn off translation of Katakana names. When the parameter katakanaTransliterationsOnly
is set to true
, Japanese names written in Katakan will only be transliterated. The parameter is off by default. This will improve the speed of matching Japanese names, but may reduce accuracy. (RLPNC-5194)
[en] Bug Fixes
[en] Release 7.29.3.c61.0
[en] November 2019
[en] Bug fixes
[en] Release 7.29.2.c61.0
[en] October 2019
[en] New Features
[en] Added a new parameter, addressUnpairedFieldScore
, which allows you to adjust the score of unpaired fields during address matching. (RLPNC-5196)
[en] Added new token overrides to the English-English organization token override file. (RLPNC-5200)
[en] Release 7.29.1.c61.0
[en] September 2019
[en] New Features
[en] Updated support for Mac OS X platform from version 10.7+ to version 10.9+. Documents have been updated to reflect the changes. (RLPNC-5161)
[en] Added new token overrides to the English English token override file and new fullname override to Japanese English fullname override file. (RLPNC-5166)
2019年8月
新機能
新しいパラメータ、numericTokenFrequencyRankが追加されました。これにより、数字が正規化されなかったエンティティタイプである数字付き名称の重み付けを調整できるようになりました。 デフォルトでは無効になっています(0に設定)。
ラテンスクリプトのlanguage of originにスペイン語を追加しました。
統計モデルにhmmScoreBiasとhmmScoreLimitパラメーターが追加されました。hmmScoreBiasは最終トークンスコアに適応され、hmmScoreLimit(0.0から1.0)はその結果を調整します。今までのhmmScorerBiasはhmmNormBiasの変数として rlpnc/data/etc/internal_parameter_defs.yamlに含まれています。
新しいパラメーターexactLatnMatchScoreが追加されました。これは、ラテンスクリプトでの完全なトークンの一致に対して返されるスコアを制御します。 デフォルトはティブラテンスクリプト1.0ですが中国語と中国語を一致だと0.937に設定されてます。
kanjiMismatchPenaltyとnotExactMatchPenaltyの2つの新しいパラメーターが追加されました。 kanjiMismatchPenaltyは2つの日本語の名前が同じ読みになる漢字で書かれた場合に最終スコアを調整できます。 notExactMatchPenaltyを使用すると、正規化の2つの名前が完全に一致するように見える場合にスコアを調整できます。このペナルティは1.0のスコアを防ぐために適用されます。
2019年5月
新機能
reorderPenaltyパラメーターは、長い名前のマッチングを改善するために、線形回帰の抑制の代わりに指数関数的に減衰するペナルティーを制御します。
日本語の名前の一致と翻訳を改善するために、半角文字を全角文字に正規化しました。(RLPNC-5028)
中国語の異体字を正規化して名称照合と翻訳の精度を向上しました。(RLPNC-5045)
[en] Third-party component updates
[en] Releases 7.27 and earlier
[en] New Features and Bug Fixes
[en] Third-Party Components
[en] New Features and Bug Fixes in 7.28.1.c61.0
[en] New Features
[en] New Features and Bug Fixes in 7.28.0.c61.0
[en] New Features
[en] The reorderPenalty parameter now controls an exponentially decaying penalty instead of a linear one, in order to improve matching of longer names. (RLPNC-3661)
[en] Normalized some Katakana "small" characters into their full sized counterparts to improve Japanese name matching and translation. (RLPNC-5028)
[en] Normalized Extension A Chinese characters into their variants to improve Chinese name matching and translation. (RLPNC-5045)
RNI 7.27.0.c60.0 (7.25.1からの更新
)
新機能
ハンガリー語のLOCATIONエンティティタイプの頻度言語モデルを追加し、PERSONエンティティタイプのモデルを再トレーニングし、精度を向上しました。(RLPNC-5003, RLPNC-5004)
Lucene 7.6.0とSolr 7.6.0.に対応しました。(RLPNC-5000, RLPNC-5020,RLPNC-5000)
[en] Upgraded the RNI Solr plugin to support Solr 7.6.0. (RLPNC-5000)
新機能
英語の住所照合を追加しました。(RLPNC-4329, RLPNC-4342, RLPNC-4351, RLPNC-4354, RLPNC-4812, RLPNC-4931, RLPNC-4941, RLPNC-4942, RLPNC-4943, RLPNC-4966, RLPNC-4968, RLPNC-4970, RLPNC-4973)
ハンガリー語の複数文字のイニシャルに対応。(RLPNC-4958, RLPNC-4959, RLPNC-4989)
英語/日本語および英語/中国語の組織名照合の改善。(RLPNC-4937, RLPNC-4938, RLPNC-4939, RLPNC-4940, RLPNC-4952, RLPNC-4962, RLPNC-4971)
新機能
ハンガリー語/英語とハンガリー語/ハンガリー語の名称照合機能を追加しました。 (RLPNC-4744, RLPNC-4754, RLPNC-4821, RLPNC-4879, RLPNC-4886)
解決されたバグ
固有表現の種類の前に、起源となる言語が設定される名称構築プロセス時のバグを修正しました。ただし、起源となる言語は固有表現の種類に依存します。(RLPNC-4858)
中国語を重複して翻訳する問題を修正しました。 (RLPNC-4796、RLPNC-4838)
名称翻訳で信頼度スコアが0になる不具合を修正しました。 (RLPNC-4871)
新機能
[en] Bug Fixes
[en] Parameter Additions and Changes
[en] Added a new parameter, includeExtraKatakanaPersonReadings, in rlpnc/data/etc/internal_param_defs.yaml. If true, it will include the foreign person name readings even when the language-of-origin is unknown. By default it is set to false. (RLPNC-4841)
[en] New Features and Bug Fixes in 7.23.2.c59.2
[en] New Features
[en] New Features and Bug Fixes in 7.23.0
[en] New Features
[en] Continued adding support for Greek by reviewing the match score adjustments. (RLPNC-4629)
[en] Added a new parameter (see below) that controls the use of different segmentation schemes for Japanese. (RLPNC-4677)
[en] Bug Fixes
[en] Parameter Additions and Changes
[en] Added a new parameter, useOldAndNewNameSegmentationForJapanese, that allows multiple segmentation schemes to be applied to Japanese names. (RLPNC-4677)
[en] Adjusted the finalBias parameter for Greek/Greek and Greek/English name pairs to ensure scores for these matches are in line with those of other languages supported by RNI. (RLPNC-4629)
[en] Adjusted parameters for matching names of Arabic origin to English names. (RLPNC-4700)
[en] New Features and Bug Fixes in 7.22.0
[en] New Features
[en] Added the capability to identify language of origin for Latin-script names. Currently this feature will categorize names as Arabic, Chinese, English, Japanese, and Korean. (RLPNC-4546)
[en] Continued adding support for Greek by setting the default transliteration scheme to ISO-843, adding a Greek/English statistical model, gender identification, name overrides, language model, edit distance scoring, as well as adding support for initials and initialisms. (RLPNC-4622, RLPNC-4676, RLPNC-4628, RLPNC-4626, RLPNC-4651, RLPNC-4630, RLPNC-4631, RLPNC-4689, RLPNC-4674)
[en] Improved the accuracy of language identification for Hani-script person names. (RLPNC-4532)
[en] Bug Fixes
[en] Parameter Additions and Changes
[en] Adjusted the reorderPenalty in the jpn_jpn_ORGANIZATION profile to 0.070 to improve performance for this profile. (RLPNC-4659)
[en] Enabled useEditDistanceTokenScorer for Greek/Greek and Greek/English. (RLPNC-4665)
[en] New Features and Bug Fixes in 7.21.1
[en] New Features
[en] New Features and Bug Fixes in 7.21.0
[en] New Features
[en] Added a new parameter (see below) that allows one to tune a penalty applied to pairwise match scores if the two names involved are of different lengths. (RLPNC-4554)
[en] Added a new parameter (see below) that allows one to tweak the resulting match score of the case in which two identical names with unknown field are matched against each other. (RLPNC-4582)
[en] Improved gemination rules of ISO11940-2 to improve Thai translation accuracy. (RLPNC-4579)
[en] Relocated jar files distributed previously under rlp/lib/BT_BUILD to now be in a more central, platform-neutral location of rlpnc/lib/jvm. (RLPNC-4583)
[en] Began adding support for Greek/English and Greek/Greek name matching. Note: Greek support is currently extremely minimal and will be more full in a future release. (RLPNC-4600, RLPNC-4611, RLPNC-4615, RLPNC-4621)
[en] Bug Fixes
[en] Parameter Additions and Changes
[en] Added a new parameter, nameLengthMismatchPenalty, in rlpnc/data/etc/parameter_defs.yaml. The penalty is off (0) by default and only overridden for the zho_zho profile to be 0.55. Lower values will penalize the score less, and higher values will apply a more drastic penalty. As part of the work of adding this parameter, we also adjusted the zho_zho deletionScore from 0.250 to 0.314.
[en] Adjusted the expensiveScorerJoinedTokenLimit in the jpn_jpn_VEHICLE profile to 5 to improve performance for this profile.
[en] Added a new parameter, sameNameUnknownFieldMatchInterpolator, in rlpnc/data/etc/parameter_defs.yaml. This parameter affects a rare case in which two identical names with unknown fields are matched against each other. Usually, RNI would score identical names as 1.0. However, unknown tokens have their own penalty that applies, defined by unknownVsUnknownScore. This new parameter interpolates between the would-be score and 1.0. The default value is 1, which means that 1.0 will be returned in these cases. Turning the parameter to 0 will fall back to the would-be score, and values in-between will interpolate between these two numbers.
[en] New Features and Bug Fixes in 7.20.0
[en] New Features
[en] Added a new Thai/English statistical model for matching to improve Thai/English name match performance. (RLPNC-4429)
[en] Added a new Thai name segmentation dictionary, improving segmentation and match performance. (RLPNC-4421)
[en] Improved Thai transliteration, benefiting translation and match performance. (RLPNC-4547)
[en] Enhanced the Thai stop word list, providing better stop word removal from Thai names during matching. (RLPNC-4461)
[en] Tuned the finalBias value for Thai/Thai and Thai/English name pairs to ensure scores for these matches are in line with those of other languages supported by RNI. (RLPNC-4445)
[en] Greatly improved Arabic/Arabic match performance by adding an edit distance metric. (RLPNC-4508)
[en] Improved match performance for name pairs in which the names are identical when spaces are removed. (RLPNC-4495)
[en] Added a few new entries to English/English token overrides. (RLPNC-4529)
[en] Modified the names of RNI's internal Lucene fields so that they are simpler and standardized. (RLPNC-2506)
[en] Added the ability to disable support for individual languages. See internal_param_defs.yaml in rlpnc/data/etc for more information on how to use this feature. (RLPNC-4558)
[en] Removed support for Solr 5. (RLPNC-4531)
[en] New Features and Bug Fixes in 7.19.0
[en] New Features
[en] Added preview support for Thai in name matching and name translation. (RLPNC-4444, RLPNC-4420, RLPNC-4419, RLPNC-4417, RLPNC-4490, RLPNC-4424, RLPNC-4423, RLPNC-4493, RLPNC-4479, RLPNC-4418)
[en] Upgraded RNI to use Lucene 6.6. (RLPNC-4292)
[en] Upgraded the RNI Solr plugin to support Solr 6.6. Removed support for Solr 4. (RLPNC-4450, RLPNC-4456)
[en] Changed default behavior of Chinese names during Chinese / English matching so that they are assumed to be of Chinese origin unless otherwise specified. (RLPNC-4375)
[en] Improved Russian / English name matching in that Russian names now include multiple translations. (RLPNC-4496)
[en] Greatly improved Chinese / Japanese organization name language detection. (RLPNC-4477)
[en] Improved match performance when engEngFastMode is enabled. (RLPNC-4488)
[en] Improved name matching to account for more substring matches. (RLPNC-4498)
[en] Bug Fixes
[en] New Features and Bug Fixes in 7.18.0
[en] New Features
[en] Upgraded the native libraries of the native Linux-compatible release of RNI. We are now using CentOS 6 to build these libraries, as CentOS 5 has reached EOL. The new BT_BUILD value for the Linux package is amd64-glibc212-gcc44. (RLPNC-4278)
[en] Added a new config parameter, engEngFastMode which improves speed for English-English matching by turning off HMM and simplifying queries. For more information, check the documentation in internal_param_defs.yaml. (RLPNC-4357)
[en] Included simple serialize and deserialize methods on Name and DateSpec. (RLPNC-4344)
[en] Added two new config parameters, doQueryFuzzy and doQueryPhrase which affect components of RNI's internal Lucene queries. (RLPNC-4312)
[en] Improved the automatic inference of BT_BUILD values. (RLPNC-4368)
[en] Added static and deprecated attributes for config parameters. A "static" parameter is one that always has the value loaded in the default parameter profile; setting a static parameter to a different value in other profiles has no effect whatsoever. A "deprecated" parameter is one that we are proposing to eliminate; binding its value to anything other than the default results in a warning. (RLPNC-4193)
[en] Improved the efficiency of when the HMM is used in the case of English-English name pairs. (RLPNC-2988)
[en] Bug Fixes
[en] New Features and Bug Fixes in 7.17.1
[en] New Features
[en] New Features and Bug Fixes in 7.17.0
[en] New Features
-
[en] Enhanced semantic matching of tokens in Organization names through use of word embeddings.
[en] Note: This drastically increases the size of the SDK package. To reduce the size, the embeddings dictionaries in rlpnc/data/tvec/multilingual can be removed as long as the corresponding language pairs in parameter_profiles.yaml have 'useEmbedded' set to false. (RLPNC-4173, RLPNC-4201, RLPNC-4219, RLPNC-4244)
[en] Added the ability for specific token overrides to always override the score between tokens even if a different method of matching generates a higher score. This can be used to prevent specific token pairs from matching. (RLPNC-3951)
[en] Enhanced the segmentation of Japanese Organization names through decomposing compounds. (RLPNC-2910)
[en] Improved the accuracy of fuzzy phonetic matching between Japanese and English. (RLPNC-2444)
[en] Implemented support for multi-token "token" overrides. (RLPNC-4080)
[en] Added token override dictionary for matching Organization names between Japanese and English. (RLPNC-4172)
[en] Ensured compatibility with RLP version 7.15. (RLPNC-4190)
[en] Upgraded Solr plugin to support Solr 6.2. (RLPNC-4179)
[en] Bug Fixes
[en] Fixed an issue introduced in 7.15.0 that caused a significant slowdown in English-English querying and matching. (RLPNC-4217)
[en] Fixed an issue where NameBuilder.hintLanguage() would return the incorrect value. (RLPNC-4189)
[en] Digits are no longer stripped from non-English Organization names. (RLPNC-4211)
[en] Restored missing Katakana segmentation data. (RLPNC-4210)
[en] New Features and Bug Fixes in 7.16.0
[en] New Features
[en] Added support to RNI for Japanese-Chinese, Japanese-Korean, and Korean-Chinese name matching. (RLPNC-3900, RLPNC-4096, RLPNC-4095)
[en] Added a new query parameter, namesToCheckAllowance, which sets the general proportion of names to pass to the high-precision filter. This is used at query time to determine the number of names to check based on the commonality of the query name in the index, allowing for more efficient querying. As a result, generally a higher setting of maximumNamesToCheck can be used. This involved making a breaking change to the INameIndexFilter interface. This behavior was also added to the Solr plugin with a parameter called reRankDocsAllowance. (RLPNC-4059, RLPNC-4150)
[en] Added a new query parameter, scoreToCheckRestriction, that acts as a more efficient replacement for minimumScoreToCheck and improves query speed. The minimumScoreToCheck parameter has been deprecated. This behavior was also added to the Solr plugin with a parameter called scoreToRerankRestriction. (RLPNC-3665, RLPNC-4166)
[en] Enhanced our Japanese name segmentation logic and expanded our inventory of Japanese segmentation data, improving the accuracy of both Japanese name translation and name matching. (RLPNC-4117, RLPNC-4118, RLPNC-4144, RLPNC-4145)
[en] Enhanced the Lucene query logic in our first-pass filter to improve accuracy on sparse fuzzy queries. (RLPNC-4081)
[en] Improved the speed of some non-English name matching by pruning unlikely translation alternatives. This is controlled by a new config paramater, alternativePairsToCheck(RLPNC-4138)
[en] Added a new config parameter, queryAlternativeOriginLanguages, which controls the set of query languages where transliterations of alternate origins are made part of the query. If matching Chinese, Japanese, and Korean names this can be adjusted to improve accuracy at the cost of speed. (RLPNC-4158)
[en] Changed CachedScorer to accept both completed and uncompleted names. Users should no longer have to complete a Name object. In any case, obtainCompletedName and checkComplete methods have been added to Name and the StandardNameIndex.completeName method has been deprecated. (RLPNC-4149)
[en] Modified the method StandardNameIndex.generateHighRecallKeys to no longer mutate the given Name object. (RLPNC-4147)
[en] Bug Fixes
[en] Fixed an issue that could result in an NullPointerException when matching a name consisting solely of a fullwidth semicolon and likely other rare forms of punctuation. (RLPNC-4134)
[en] Setting maximumNamesToConsider to Unlimited (eg. -99) will no longer result in a write lock exception when querying with multiple threads. (RLPNC-3589)
[en] Special characters like '#' are now normalized out of Arabic names to prevent unwanted effects on match scores. (RLPNC-3961)
[en] Completing a name no longer alters the original language-of-origin. This was occasionally causing Names to have different results when completed multiple times. Instead, Names have a derivedLanguageOfOrigin() method. (RLPNC-4132)
[en] New Features and Bug Fixes in 7.15.1
[en] Bug Fixes
[en] Fixed an issue that was causing the Solr 6 plugin to use the slower query that supports multivalued fields when not necessary. (RLPNC-4116)
[en] Fixed a concurrency bug involving parameter profiles used by the HMM. Also fixed in 7.14.1. (RLPNC-4128)
変更点
Java 1.7のサポートを廃止。Java 1.8以降をご使用ください。(RLPNC-4077)
コンポーネントレベルでの例外処理を、実行時の例外処理に変更。またRNILicenseException と RNTLicenseExceptionの例外クラスを削除しました。コード変更が必要です。(RLPNC-3982)
日本語名の辞書と読みデータを拡充。名称の翻訳・照合精度が向上しました。(RLPNC-3985, RLPNC-4032, RLPNC-4084)
日本語の句読点を、英語処理時と同じになるよう正規化。(RLPNC-4030)
RNIで日本語-英語の組織名上書きファイルを更新。(RLPNC-4064)
reRankFilterの指定を、Solrjユーティリティーコードに追加。(RLPNC-4057)
RNI/RNT Webサービスは廃止予定に追加。今後はRosette APIまたはSolr/Elasticsearchプラグインに置き換わります。(RLPNC-4076)
Luceneバージョンを6.0に変更。minimumScoreToCheckの変更が必要です。(RLPNC-4077)
Solr 6.0プラグインに対応。Solr 4.x のサポートは廃止予定リストに追加しました。(RLPNC-4078)
RNI照合パラメータの調整方法のパブリックサポートとドキュメントを改良。(RLPNC-4068)
RNIの言語モデルのカスタムトレーニングのサポートを追加。(RLPNC-4028)
Rosette API RNTサービスで、入力言語が指定されたターゲット言語に翻訳できない時、例外処理を返すように変更。(RLPNC-4086)
不具合修正
[en] New Features and Bug Fixes in 7.14.1
[en] Bug Fixes
変更点
Solr 5.5プラグインに対応。(RLPNC-3929)
大文字・小文字の区別を中止するパラメータを追加。(RLPNC-3963)
(Pure Java版) 非対応の言語で名称照合を行ったときのエラーメッセージを改善。(RLPNC-3997)
いくつかのフィールドに分かれた名称を処理する際の重み付け方法を改善。各フィールドに含まれる要素に、最低限のウェイトを適用するように変更しました。(RLPNC-3984)
RuleSetTranslatorのマクロスクリプトを改良 (例:Jpan および Kore) (RLPNC-3981)
Elasticsearchプラグインのインストール確認用のメソッドを追加。(RLPNC-3890)
btrlp.jarおよびbtutil.jarの依存を解消。(RLPNC-3976)
ウェイトの低いトークンをユーザーが追加できるよう変更。(RLPNC-3950)
キャラクターレベルのngramを追加することで、RNIの精度を向上。(RLPNC-3946, RLPNC-3969)
不明なフィールドマーカーの照合精度を向上。(RLPNC-3949, RLPNC-3968)
JavaのAutoClosableを実装するclosableオブジェクトを変更。(RLPNC-3944)
パラメータのプロファイルに異なるRNI言語モデルを定義できるよう変更。(RLPNC-3817)
Elasticsearchプラグインからクエリー、および照合時に複数のパラメータを設定できる機能を追加。(RLPNC-3814, RLPNC-3932)
(日本語の)漢字名の言語判別精度を向上。(RLPNC-3831)
Elasticsearch 2.2.1のプラグインを追加。(RLPNC-3862, RLPNC-3959, RLPNC-4000)
日本語の漢字名から、中国語名と朝鮮語名の読みを削除。(RLPNC-3874)
Luceneクエリーを最適化し処理速度を向上。Solr および Elasticsearch プラグインにも適用。(RLPNC-3805, RLPNC-3875, RLPNC-3931)
RNIの照合用パラメータを設定ファイル( rlpnc/data/etc/parameter_defs.yaml)に実装。(RLPNC-3692, RLPNC-3938)
Luceneバージョンを5.2.1に変更。(RLPNC-3708)
名称の分節に於いて、曖昧さ解消のために空白を追加する機能を追加。これは、例えばRobertJohnson Smith と Robbert Smithを照合する際に有効です。(RLPNC-3826)
URLからペア照合のデモにパラメータを設定する機能を追加。例: http://localhost:9022/rnipm/?name1=Robert%20Smith&name2=Bob%20Smith (RLPNC-3838, RLPNC-3839)
名称の性別を判断する処理から、AlやJoなどの二文字のトークンを除外。また誤った性別判断をしていた処理を調整。(RLPNC-3825, RLPNC-3808)
要素が削除された名称と、何も削除されていない名称を照合する際、残りの要素が複数個ある場合、スコアを底上げするよう変更。(RLPNC-3815)
ペルシャ語のIC音訳を修正。ユーザーの要望に合わせていくつかの音訳語を変更しました。(RLPNC-3769)
不具合修正
中国語-中国語の照合で、API経由でライセンスを指定する際に起きていた不具合を修正。(RLPNC-3844)
Elasticsearchプラグインで起きていた、'|', '\'などの特殊文字をエスケープできない不具合を修正(RLPNC-3852)
停止語の正規表現ファイルに何も含まれない時、停止語ファイルがロードされない不具合を修正。(RLPNC-3853)
Elasticsearchプラグインで、処理の前にドキュメントが有効かどうかをチェックする機能を追加。またエスケープされない波括弧が含まれている際など、より明確なエラーメッセージを出力するようにしました。(RLPNC-3889, RLPNC-3974)
中国語名の照合で、空白を中黒と同様にデリミターとして扱うよう変更。(RLPNC-3913)
特定の名称を、英語からロシア語へ翻訳する際に起きていた不具合を修正。(RLPNC-3927)
日本語の組織名に使われる特殊記号を正規化するよう変更。(RLPNC-3943)
変更点
Rosette名称照合モジュールのElasticsearchプラグイン(試験版)を追加。[2]人名、地名、組織名の照合機能を組み込むことが出来るようになりました。Elasticsearchプラグイン(別パッケージ)はJava専用です。英語-英語のみに対応しています。(RLPNC-3747)
Java 1.6のサポートを廃止。Java 1.7以降をご使用ください。(RLPNC-3506)
RNIのクエリーの処理速度を50%向上。(RLPNC-3740, RLPNC-3714, RLPNC-3700, RLPNC-3699, RLPNC-3660, RLPNC-3596, RLPNC-3490, RLPNC-3734)
英語-朝鮮語、朝鮮語-英語の処理速度が飛躍的に向上しました。(RLPNC-3658)
データフィールドの有無に関わらず照合ができるよう改良。(RLPNC-3644)
正規形を含んだテキストファイルで、トークンの正規化が行えるようになりました。照合スコアの精度が向上しました。デフォルトでequivalenceclasses_eng_PERSON.txtを用意し、 Muhammadの正規化が行えるようになりました。(RLPNC-3798, RLPNC-3802)
照合精度向上のため性別操作を見直しました。フルネームおよびトークンの上書き、トークンの完全一致の際には、性別操作は適用しません。英語用上書きファイル( tokens_eng_eng.txt)から、性別の異なる名前のペアを削除。(RLPNC-3767, RLPNC-3788, RLPNC-3773)
英語用トークン上書きファイル(tokens_eng_eng.txt)に名前とニックネームを追加。(RLPNC-3787)
名称照合の際、空白文字を補う機能を追加。JohnFitzgerald Kennedy と John Kennedyを照合の際など、従来よりも照合スコアが向上しました。(RLPNC-3781)
JongとJxngのように、微細なスペルミスでも照合スコアが下がらないよう改良。SpanMatch.Reason: STRING_SIMILARITYを追加。(RLPNC-3772)
RNTCLIをAntなしで使えるよう、.batファイル(Windows)とシェルスクリプト(Unix)を追加。(RLPNC-3776)
RWS-Names webサービスのインストーラー(Windows MSI)にデジタル署名を追加。(RLPNC-3641)
RNIの各種照合機能を示すJavaサンプルを追加( MatchPhenomenaSample.java)。
Solr 4.9+プラグインにRNIReRankQParserを追加。最小スコアを確認し、SolrドキュメントのスコアをRNIのスコアで置き換えることが可能になりました。(RLPNC-3650, RLPNC-3651)
Solr 4.9+ のプラグインにフィールド名の処理機能を追加。(RLPNC-3592)
solrjのユーザーに便利なRNIユーティリティーを追加。詳細はcom.basistech.rni.solr.indexをご覧ください。(RLPNC-3615)
効率を高めるためと、フィールド名の処理で起きていた不具合を解消するために、名称から削除される肩書きをstopregexesファイル (rlpnc/data/rnm/ref/override/stopregexes_eng_PERSON.txt)からstopprefixes (rlpnc/data/rnm/ref/override/stopprefixes_eng_PERSON.txt)に移動。フィールド名の処理の際、stopregexを適用するフィールドを指定できるようになりました。(RLPNC-3800)
RWS-Names使用時のセキュリティー向上のためTomcatのバージョンを8.0.21に更新。 (RLPNC-3819)
不具合修正
フィールド名照合の際に起きていた、RNIの停止語の不具合を修正。(RLPNC-3766)
RNT逐次翻訳で西ファルシ語およびダリ語から英語に翻訳の際に起きていた、最後の文字が欠落する不具合を修正。(RLPNC-3760)
リリース7.12で起きていたNameIndexQuery#setTestPrimaryData(true)の不具合を修正。クエリーName および候補Nameに対し、isPrimary() が false の時、このメソッドは名称を返しません。(RLPNC-3682)
RNI-RNTをrntディレクトリーにインストールした際、RNIのバイナリー辞書のロードが出来ない不具合を修正。(RLPNC-3793)
RNIペア照合のデモを修正。サポートされていない言語ペアの時にスコアを返さなくなりました。また空のフィールド名を正しく処理できるようになりました。(RLPNC-3702, RLPNC-3640)
Tanukiのバージョンを3.5.26に更新し、Mac OS X 上でlaunch.shスクリプトでRWS-Namesを起動できるようになりました。(RLPNC-3819)
コマンドラインスクリプト(rnicli および rntcli)が警告なしで実行できるようクラスパスを修正。(RLPNC-3804)
com.basistech.rni.solr.NameFieldおよび関連するファイルの仕様を変更。Solr スキーマブラウザーで正しく表示できるようになりました。(RLPNC-3806)
bt_rni_Name_StoreサブフィールドがSolrプラグインのクエリー結果に表示される不具合を修正。本サブフィールドをドキュメントの一部としてインデックスおよび保存しないよう変更し、クエリー結果に表示されなくなりました。(RLPNC-3807)
[en] This is the first release of RNI-RNT Java Only. It supports indexing, querying, and matching names in English, French, German, Italian, Portuguese, and Spanish. At this time, translation (RNT) and support for Arabic, Western Farsi, Dari, Pushto, Urdu, Korean, Chinese, Japanese, and Russian are supported only in the native releases. Over time, we plan to incorporate support for these features and languages in the Java edition.
変更点
Java 1.6のサポートを廃止予定リストに追加。RNIには、Java 1.7以降、Solr 4.8以降をお使いください。(RLPNC-3534)
-
MAC OS X v10.7 Darwin 11 (amd64-darwin11-xcode4)、 32-bit および 64-bit Windows Visual Studio 2012 (ia32-w32-msvc110, amd64-w64-msvc110)に対応。
Java 1.7に対応しない次のプラットフォームのサポートを廃止: MAC OS X v10.5 Darwin 9 (universal-darwin9-gcc40)、Red Hat Enterprise Linux (ia32-glibc23-gcc32, ia32-glibc23-gcc34, amd64-glibc23-gcc34)。(RLPNC-3601)
米国大統領名を含んだサンプルRNI インデックスを追加。(RLPNC-3562)
RNIにß から ss への正規化を追加。英語でクエリーおよび照合が可能になりました。Russland は Rußland 類似スコア0.99で合致するようになりました。(RLPNC-3560)
朝鮮語から英語への翻訳を追加。通俗的音訳により朝鮮語名の英語表記により近づけました。(RLPNC-3521)
RNIのminimumScoreToCheck のデフォルト値を0から0.05に変更。精度を落とすことなく処理速度が向上しました。万が一再現率に不満がある場合は、値を0に戻してください。(RLPNC-3513)
"general" や "mr"など英語の停止語を、PERSONにのみ適用するよう変更。”General Electric"や"MR electric units"などを正しく処理できるようになりました。(RLPNC-3508)
NCTAとNICTAなど、略語が合致しない場合、RNM のトークン合致性を調整しスコアを下げました。その結果誤検出を減らすことができました。(RLPNC-3495)
Solr 3.xのサポートを廃止。Solr 4.xのサポートはすべての4.xバージョンを対象。(RLPNC-3484)
Solr 4.9 以降に対応するSolrプラグインを追加。RNIをSolrアプリケーションからシームレスに使えるようになりました。従来のSolr 4.x プラグインは廃止予定リストに追加しました。RNIinSolr4xSample を RNISolrjSample で置き換え。新しいプラグインを使いorg.apache.solr.client.solrj コールで、複数の名称フィールドのインデックスとクエリーを行います。(RLPNC-3465, RLPNC-3556, RLPNC-3603)
人名と組織名を含むOFACインデックス(およびXMLのソースファイル)を追加。同時に新しいSolrプラグインからの使い方を紹介。(RLPNC-3610)
solrconfig.xml内のRNI requestHandler を '/RNI' に変更。Solr 4.x から使う際は、requestHandler の前に'/'を付けてください。(RLPNC-3533)
SolrCloud でRNIとSolrプラグインをお使いの際、bt_rni_NAME_UID でuniqueKeyを指定しなくても使えるようになりました。またshard.qt パラメータ (カスタム・リクエスト・ハンドラー用) も/RNIに設定しなくても使えます。(RLPNC-3534)
RNI のペア合致デモを更新。名称の部分部分のスコアと共に、より照合結果を見やすく改善しました。(RLPNC-3459, RLPNC-3590)
インデックスとクエリーが両方共にアルファベットの略語の場合のRNIの照合精度を向上。(RLPNC-1779)
RNIのキリル文字の略語処理を改善。ロシア語名のキリル文字の略語がRNIクエリーの一次処理用にインデックスできるようになり、再現率が向上しました。(RLPNC-3325, RLPNC-3442)
ロシア語の性別データを追加。ロシア語→ロシア語、ロシア語→英語の照合精度が向上。(RLPNC-35450
RNTデモ(逐次翻訳)で元の言語を指定できるようになりました。(RLPNC-3427)
RNTデモで多言語の文字種に対応。デモでは例えば漢字、ひらがな、カタカナに対しJpanというidentifierとtranslatorを用いています。(RLPNC-3590)
RNTで日本語名を英語に翻訳する際、空白を考慮するようになりました。(RLPNC-3423)
LOCATION および ORGANIZATION の停止語を中国語→英語のトークン上書きファイルに移動。元の中国語LOCATIONとORGANIZATIONファイルは削除。トークン上書きファイルにより、「市」はCityに、「公国」はParkなどに正しく変換できるようになりました。(RLPNC-3406)
RNTAからアラビア語→英語のLOCATION辞書が使えるようになりました。(RLPNC-3384)
次の言語の逆変換が可能になりました。英語/アルファベット/(元の言語) 中国語/電碼コード→中国語/漢字/ネイティブ(RLPNC-2638)
com.basistech.names.parameters. パッケージを追加。RNIパラメータのランタイム設定をサポートします。(RLPNC-3177).
NameCommonService Restful サービスを追加。入力文字列の判別もサポート。RNI、RNT両方でお使い頂けます。guesslang はJSON およびテキスト出力をサポート(例. {"text": "John"} および"John")。従来のNameTranslationでの機能のサポートは、廃止予定リストに追加。(RLPNC-3370, RLPNC-3565))
Restful Webservice で XMLの入出力をサポート。(RLPNC-3470)
07など、OFAC_PERSONS RNI インデックス内の数字の名前の照合を実現。(RLPNC-3584)
不具合修正
朝鮮語名を英語に翻訳する際、RNTが空白を不慮に削除するエラーを修正。(RLPNC-3542)
漢字とカタカナからなる文字列をJpanと正しく判別できない不具合を修正。(RLPNC-3516).
トレーニングデータに含まれない文字を英語→アラビア語のRNTモデルファイルから削除。無効な翻訳をなくしました。(RLPNC-3448)
RNTの無限ループの不具合を修正。(RLPNC-3477)
頻度に応じた名称要素のウェイト付け(頻度が高いほどウェイトが低くなる)の不具合を修正。(RLPNC-3471)
Webサービスで起きていたメモリーリークの不具合を修正。(RLPNC-3608)
中国語の電碼コード指定時に、英語名からPinyinが出力されてしまうTranslationAssistantの不具合を修正。(RLPNC-3606)
Abedin Zain Ulのようなアラビア語名の英語表記で、定冠詞(al)が変換後に削除される不具合を修正。(RLPNC-3485)
nameDataMinimumMatchScoreがRNIクエリーの二回目の処理で起きていた不具合を修正。ただし結果を返せるようになりました。(RLPNC-3599)
RNI Webサービスの初期化中、インデックスの同期の際に起きていた不具合を修正。(RLPNC-3620)
フィールドデータが空白の際に起きていたStringIndexOutOfBoundsExceptionの不具合を修正。(RLPNC-3633)
変更点
朝鮮語の音訳体系にKORDAを追加。RNIの朝鮮語名の処理精度が向上しました。(RLPNC-3444)
朝鮮語ICおよびKORDA音訳体系でᆻ jamoの処理をサポート。(RLPNC-3443)
接頭辞の停止語機能を追加。正規表現による接頭辞処理よりも処理速度が向上しました。同じ接頭辞を含む場合、長い文字列が優先されます。例えばlieutenant colonelとcolonelの二つが停止語として存在し、文字列が前者に該当する場合、前者が適用されます。(RLPNC-3436)
不具合修正
入力がMuhammed、元の言語がパシュトゥ語の時 TranslationAssistant がStatisticallyInferredを代替出力として返す不具合を修正。HumanlyAttestedと正しく返すようになりました。(RLPNC-3464)
RNI Solr クライアントで翻訳が重複する不具合を修正。処理速度が向上しました。(RLPNC-3454)
"|" や ";" など、特殊な区切り文字を含む中国語名を英語に翻訳する際の不具合を修正。新|w2ㄙ垂O や 画面;广场などを正しく翻訳できるようになりました。非中国語の文字列はそのまま残します(RLPNC-33453 and RLPNC-3445)。
NameIndex web サービスでRNIのクエリーを実行する際に起きていたNameDomainの不具合を修正。(RLPNC-3446)
RNIのSolr 1.4へのサポートを廃止。(RLPNC-3232)
RNI webサービスへの同時呼び出しをサポート。(RLPNC-3397)
中黒 (U+00B7) の代わりにビュレット (U+2022) を含む中国語名を英語名に翻訳できるようになりました。非中国語名では中黒は区切り文字として使われます。(RLPNC-3395)
朝鮮語のjamoを英語に翻訳する際に起きていたクラッシュする不具合を修正。jamoは正しくアルファベットに音訳されるようになりました。(RLPNC-3394)
辞書の他にcom.basistech.rnt.assistantAPI で統計による推量処理を追加。アラビア語、朝鮮語、ロシア語、中国語名を英語に変換、あるいは英語名をアラビア語、朝鮮語、ロシア語に変換する際の代替候補出力を行います。推量処理による出力の場合、その情報はDataSourceType.StatisticallyInferredに含まれます。(RLPNC-3283, RLPNC-3309, RLPNC-3063)
RNI webサービスがインデックスをオープンできない例外エラーをログ出力できない不具合を修正。(RLPNC-3389)
元の言語と変換先の言語を指定するだけで翻訳できるようDefaultTranslationPairs クラスを追加。(RLPNC-3372)
NameオブジェクトにhintLanguage プロパティーを追加。NameBuilderが言語を推量する際に使用します。通常ヒント言語は、処理する名称を含むドキュメントの判別言語になります。ヒント言語がNameBuilderの推量する文字体系に適合すれば、NameBuilderはヒント言語を返します。適合しない場合は、推量した言語を返します。この機能はNameTranslation web サービスでも提供されます。(RLPNC-3369, RLPNC-3311))
RWS-Names web サービスが処理言語にUNKNOWNを返す不具合を修正。ユーザーが言語指定しない場合、推量言語が返るようになりました。(RLPNC-3316)
LanguageOfOrigin が、NameTranslation web service ResultAnnotations オブジェクトの翻訳結果の一部として返るようになりました。(RLPNC-3352)
ロシア語から英語への翻訳に、元の言語(ロシア語もしくは英語)を含めるようにしました。(RLPNC-3352
文字体系の推量アルゴリズムを改善。処理速度が飛躍的に向上しました。(RLPNC-3314)
Solrプラグインを修正。インデックスに名称を追加する際、元の言語を含むことができるようになりました。(RLPNC-3305)
RNIの照合プロセスを改善。Carnera Baer Braddock Louis Charles Walcott と Charles Walcott, Carnera Baer Braddock Louisなど、名前の順番が異なる同一名を正確に判別できるようになりました。(RLPNC-3304)
RNIのポーランド語の特殊文字 (U+142) を正規化。MichałとMichalの照合が可能になりました。(RLPNC-3303)
特定のアラビア語名(اسامة نين اع بن لادن)を英語に音訳する際に起きていた例外エラーを修正。 (RLPNC-3301)
RWS-Names web インターフェイスを拡張。二つの名前を入力し、それらを照合することが可能になりました。同時にデモも追加。(RLPNC-3012)
RNI デモを修正。クエリー名の元の言語と固有表現タイプを指定できるようになりました。(RLPNC-3295, RLPNC-3104)
RNIのライセンスを渡す静的メソッド(RNIConfiguration.setLicenseXML(String licenseXML) )を追加。RNTの (RNTEnvironment.setLicenseXML(String licenseXML)メソッドに合わせました。(RLPNC-3344)
RNIのロシア語、朝鮮語のモデルをトレーニングし精度を向上。(RLPNC-3289, RLPNC-3241)
IC音訳でのペルシャ語のezafe (-e) の不具合を修正。(RLPNC-3273)
クエリーが返すMatchResult オブジェクトからデバッグ情報を取得できるようになりました。NameIndexQuery から setIncludeDebugInfo(true)を呼び出し、それぞれの MatchResultからgetDebugInfo()を呼び、どのように結果が取得されたかがデバッグできます。(RLPNC-3266)
RNICLIをコマンドラインから実行できるよう、Windows用.batファイルおよびUnix用シェルスクリプトを追加。(RLPNC-3203)
朝鮮語名トレーニングデータを充実し精度を向上。(RLPNC-3341)
Korea National Oil Corporation とKorea National Oil Corpが合致するよう、トークン上書きファイル(eng_eng_ORGANIZATION)に"corp"を追加。(RLPNC-3340)
김 が Kimに合致するよう、RNIのアルゴリズムを改善。(RLPNC-3334)
đ (U+0111)などアルファベットの特殊文字の正規化を拡充。(RLPNC-3322)
ハングル+漢字のエイリアスにKoreを追加。(RLPNC-3318)
RNIの起動時にクラッシュしたスレッドの不具合を修正。(RLPNC-3315)
英語表記の朝鮮名のBGN音訳を追加。(RLPNC-3281)
ISO15924Utils.scriptForStringを拡張。日本語のHrktまたはJpanが返せるようになりました。Hrkt はカタカナとひらがなの混合 (例: トイザらス)、Jpan はカタカナ、ひらがな、漢字の混合 (例: トヨタ自動車株式会社)になります。(RLPNC-3279)
RNIでハイフンを含む英語表記名と朝鮮語名の照合精度を向上。(RLPNC-3277)
NameBuilder がInvalidNameExceptionの代わりにUnchecked Exceptionを使用するようになりました。名称を生成する度にtry/catchをする必要がなくなりました。(RLPNC-3276)
guessLanguage と guessScript ユーティリティーメソッドを NameBuilderに追加。Name オブジェクトを生成することなく言語と文字体系を判別できるようになりました。(RLPNC-3275)
英語からロシア語への翻訳を追加 (Latn, eng, folk to Cyrl, rus, native) 。(RLPNC-3269)
RNTの朝鮮語のBGN音訳体系に、地域オプションを追加。com.basistech.rnt.options.KorGeographyOption が NORTHKOREAN (デフォルト)の場合、RNTはMcCune-Reischauerに準拠します。SOUTHKOREANの場合は、文化観光部2000年式を適用します。(RLPNC-3261)
朝鮮語の音訳体系にUND_BGNを追加。(RLPNC-3253)
途切れた名称データの照合精度を向上。(RLPNC-3200)
朝鮮語名が日本語データに含まれている場合、RNTは朝鮮語として分節するようになりました。(RLPNC-3173)
文字列、現在の言語、元の言語、固有表現タイプのすべてが合致の時にスコアが1.0となるよう、RNIの類似スコアの定義を拡張。(RLPNC-2636)
RNIとRNTのSystem.(err|out).print および printStackTrace() ステートメントをslf4j (logging)で置き換えました。(RLPNC-2688)
コンストラクターcom.basistech.rni.match.Nameを、廃止予定に追加。代わりにcom.basistech.rni.match.NameBuilder Build()メソッドで、nameオブジェクトのコンストラクトを行ってください。(RLPNC-3234)
Java 1.5のサポートを廃止。代わりにJava 1.6以降をお使い下さい。(RLPNC-2489)
すべての翻訳ドメインのペアに、元の言語を追加。(RLPNC-3078, RLPNC-2939)
RNIで中国語の異表記の類似スコアを改善。(RLPNC-3092)
朝鮮語名および英語名の英語表記(folk)から朝鮮語表記への翻訳に対応。(RLPNC-3211)
RNIをSolr環境でお使いの場合、RNIは言語と文字体系を推量できるようになりました。(RLPNC-2523)
効率を上げるため、RNIの名前フィールドはJSONオブジェクトの一つのフィールドに、シリアルに保存されるようになりました。(RLPNC-3225)
com.basistech.rni.index.IndexStoreDataModelFlagsにブール型のフラグを追加: usingCachingCodec. フラグがtrueの時(デフォルトではfalse)、LuceneはCachingCodecを使用し、読み込み時にRAMに入力ドキュメントをロードします。注意:この機能はRAMに依存しますが、パフォーマンス向上に大きな効果が期待できます。(RLPNC-3218)
RNIのクエリーをキャッシュし、処理速度を向上。(RLPNC-3223)
Microsoft Visual Studio 10に対応。(RLPNC-3163)
[en] New Features and Bug Fixes in 7.9.1
[en] Removed unrequired JAR files. (RLPNC-3185)
[en] Fixed a bug that caused an ArrayIndexOutOfBoundsException
when indexing names with greater than 6 RNT overrides. (RLPNC-3164)
[en] Added support to the RNT web demo for specifying the language of origin. (RLPNC-3103)
[en] Fixed a slowdown in English queries against English names that was introduced in the 7.9.0 release. (RLPNC-3141)
[en] Fixed a slowdown caused by a significant increase in calls for supported translation pairs that was introduced in the 7.9.0 release. (RLPNC-3176)
[en] New Features and Bug Fixes in 7.9.0
[en] Expanded RNI query results to indicate spans (one or more tokens) in the query name and result name that match or do not match. (RLPNC-2878)
[en] Added a RESTful interface to the Rosette Web Services for Names (RWS-Names). The SOAP interface is still in place. The RNI and RNT Web Demos use the RESTful interface. (RLPNC-2184)
[en] Extended RNI Solr plugin to support Solr 4.3. (RLPNC-4.3)
[en] Added support for translating names from English to Chinese. (RLPNC-2174)
[en] Expanded the scope of the normalization translation option to convert Chinese names in traditional script to the simplified Chinese script, and to convert Japanese Kanji variants (including old Kanji) to their standard form. (RLPNC-2914, 2846)
[en] Updated implementation of IC transliteration scheme for Chinese to conform to May 2013 deliberative draft of the IC Chinese Standardized Transliteration System for Personal Names. (RLPNC-3068)
[en] Improved the accuracy of Western Farsi, Dari, and Pushto translations. (RLPNC-2972, 2973, 2976, 3001, 3002)
[en] Added support for transforming BGN to Undiacritized BGN for Arabic, Western Farsi, Dari, and Pashto. (RLPNC-2980)
[en] Added support for translating non-Chinese person names in the Chinese language and to their traditional English representation. (RLPNC-1451)
[en] Added support for translating non-Korean person names in the Korean language and to their traditional English representation. (RLPNC-2979)
[en] Added support for transliterating Korean person names in accordance with the IC standared. (RLPNC-2720)
-
[en] Added programmatic access to RNI and RNT overrides, which lets you define your own override tables (character streams) in place of the tables in the default override directories. (RLPNC-2689)
[en] In the API documentation, see the following methods in com.basistech.rni.index
:
[en] RNIConfiguration.replaceFullnameScoreOverrideConfiguration
[en] RNIConfiguration.replaceTokenScoreOverrideConfiguration
[en] RNIConfiguration.replaceStopPatternsConfiguration
[en] and the following method in com.basistech.rnt
:
[en] Deprecated the use of translation domains that combine Latn with a non-Latn-script language, such as Arabic or Chinese. In an upcoming release, language of origin will be used to clarify the nature of the translation. For example, a translation from Arabic, Arabic script, native transliteration to English, Latin script, IC, will be a translation, not a transliteration, if the language of origin is English. (RLPNC-2954)
[en] Extended supported domain pairs to include language of origin for Arabic and Chinese. (RLPNC-3067)
[en] Changed Rosette Web Services For Names default port to 9022. (RLPNC-3048)
[en] Extended RNI Web demo to support language guessing. (RLPNC-3042)
[en] Fixed RNT problem handing some Japanese characters. (RLPNC-3052)
[en] Fixed Null Pointer Exception that occurred with some Solr RNI queries. (RLPNC-3015)
[en] Deprecated com.basistech.rni.match.setMaximumNumTokens(int maxNumTokens)
. (RLPNC-2788)
[en] New Features and Bug Fixes in 7.8.0
[en] Improved RNI accuracy matching Japanese names, including native names, names of Chinese or Korean origin, and other non-native names with English translations. These improvements in accuracy do entail a slowdown in RNI operations with Japanese. (RLPNC-2606, RLPNC-2608, RLPNC-2422)
[en] Extended RNI support for matching Japanese name variations such as nicknames and cognates (Katakana), and reordered name components. (RLPNC-2088)
[en] Added RNI sample index with Japanese names (professional baseball players) in Japanese scripts and sample queries in Latin script.
[en] Improved the accuracy of Pushto translations using the IC standard. (RLPNC-2800)
[en] Fixed bug transliterating Arabic heh (/U0647) in Pushto names. (RLPNC-2783)
[en] Refactored enforcement of constraints on setting maximum number of names to consider, to check, and to return for RNI queries, thereby enabling support for setting these constraints when using RNI with Solr. (RLPNC-2828)
[en] New Features and Bug Fixes in 7.7.0
[en] Numbers are no longer stripped from entity types other than PERSON. (RLPNC-2185)
[en] Override files for token matches may include a third item for each entry which provides RNI with additional context for handling the override: NICKNAME or COGNATE. If no value is included, the default is NICKNAME. (RLPNC-2050)
-
[en] Having determined that some classes and methods in internal
packages should be available to users, and that some publicly available APIs return a class in an internal
package, we have refactored the API as indicated in the following table. We have also refactored the API for handling a collection of names. (RLPNC-1768)
[en] To improve performance using linguistic analysis to return high-precision results from a high-recall list of names that match a query name, deprecated the static com.basistech.rni.index.StandardNameIndex queryList
method in favor of a non-static com.basistech.rni.index.StandardNameIndex filterCollection
method. (RLPNC-2560)
[en] Per the IC specification for transliterating Person names from Pushto, we provide special handling when the language of origin is Dari. We also provide variant spelling and regional options to control the transliteration. (RLPNC-2652, RLPNC-2677, RLPNC-2448)
[en] Added support for defining a set of text domains that filter results returned by an RNI query. For example, if the RNI index contains names in English, Western Farsi, Arabic, and Pushto, you can set text domains to only return names in English/Latin script, and Western Farsi/Arabic script. In the HTML API documentation, see com.basistech.rni.index.NameIndexQuery.setTargetNameDomains(Set<TextDomain>)
and com.basistech.rni.index.NameIndexQuery.testTargetNameDomains(boolean)
. (RLPNC-2718)
[en] Removed support for the JDEC-Afghanistan Pushto transliteration scheme. (RLPNC-2734)
[en] Support and an associated sample have been added for running RNI with Solr 4x. (RLPNC-2742)
[en] When the Segmentation Option is turned off for Japanese, RNT now assumes the names it is processing have been segmented by the user (tokens are space delimited). Prior to this fix, RNT treated the entire name as a single token, which produced incomplete results for names with space delimited tokens. (RLPNC-2750)
[en] Verified RLPNC support for Redhat 6.0. For 32-bit platforms, use the ia32-glibc23-gcc34
package; for 64-bit platforms, use the amd64-glibc23-gcc34
package. (RLP-3649)
[en] Improved support for vocalizing Pushto and Western Farsi names. (RLPNC-2512, RLPNC-2232)
[en] Improved handling of the ezafe in Western Farsi, Dari, Pushto (izafat), and Urdu names. (RLPNC-343)
[en] Added RNI support for indexing and querying names identified as Persian (fas
) or Dari (prs
). Persian is the metalanguage that includes both Western Farsi (pes
) and Dari. Queries for Persian names may return names indexed as Persian or Western Farsi. (RLPNC-2711)
[en] Added override file for translation of LOCATION names from Russian to English. The package now contains override files for translating LOCATION names from Arabic, Japanese, and Russian to English. (RLPNC-2176)
[en] New Features and Bug Fixes in 7.6.1
[en] Fixed a bug that prevented Solr 1.4 users from setting the isPrimary
attribute for a Name. (RLPNC-2667)
[en] Fixed the inclusion of incorrect results from name pair (fullname) override files during the high-recall phase of queries. (RLPNC-2693)
[en] Fixed a NullPointerException or the inclusion of results of the incorrect entity type from name pair (fullname) override files. (RLPNC-2676)
[en] New Features and Bug Fixes in 7.6.0
[en] Starting with this release, the RLPNC release number matches the release number of the RLP with which it should be installed (the third number may vary).
[en] Added TranslationAssistant and NameIndex web services. For Windows users, added .NET clients for each of the Rosette Web Services for Names. (RLPNC-2243, RLPNC-2272, RLPNC-2233)
[en] To clarify translation support for Western Farsi and Dari (both members of the Persian macro-language), we have replaced the Persian ISO 639 language code ("fas") with the Western Farsi language code ("pes"). For BGN, which does not distinguish between Western Farsi and Dari, we support the use of all three language codes ("fas", "pes", and "prs") for Dari. (RLPNC-2505)
[en] Improved the vocalization maps for the translation of Western Farsi and Dari names. (RLPNC-2257)
[en] Enforce requirement that the maximum number of names considered in the first pass must be greater than or equal to the maximum number of names evaluated in the second pass and the maximum number of names returned by the query. The code no longer resets these settings under the covers if the user makes a setting that violates this constraint. (RLPNC-2160)
[en] Changed the default setting for testing entity type during queries from false to true. As a result, a query only returns names that match the entity type of the query name (such as PERSON, LOCATION, ORGANIZATION, VEHICLE, or NONE). (RLPNC-2149)
[en] Upgraded to Lucene 3.6. (RLPNC-2470)
[en] Extended support for using RNI with Solr 3.x and Solr 4.x. Added an RNI sample that runs with Solr 3.x, and provided instructions in the documentation for using RNI with the Solr 3.5 Admin Example to post Name documents and perform queries. (RLPNC-2462)
[en] For queries performed with Lucene or Solr 3.x, adopted DisjunctiveMaxQuery
to improve first-pass scores with names for which multiple alternatives (such as nicknames) are defined in token override files. For a given minimumScoreToCheck
, this strategy provides more accurate recall for names submitted to the second pass. (RLPNC-2473)
[en] Added support for indexing and querying Spanish names. (RLPNC-2478)
[en] Restructured HighRecallKeys
. The StandardNameIndex generateHighRecallKeys
method is no longer available and HighRecallKeys
has been refactored. Removed the GenerateAndUseHighRecallKeys
Java sample and the RNICLI -generate-keys
option. Our working assumpation is that RNI provides HighRecallKeys
to assist in Name queries using Lucene or Solr. The underlying structure is in a state of evolution. If you want to use some other infrastructure to store Names and perform queries, please discuss this issue with us. (RLPNC-2499)
[en] Added com.basistech.rni.match.NameBuilder
as the preferred mechanism for creating Name objects. NameBuilder
provides a fluent interface that supports method chaining. (RLPNC-2515)
[en] Added language of origin as a Name field. The default value is LanguageCode.UNKNOWN
. RNT uses this value when translating foreign names from Arabic, Japanese, and Russian. (RLPNC-1717)
[en] Added an RNT translation option (MinimizeOrthographyOption
) to remove short vowel diacritics from Arabic, Western Farsi, Dari, Pushto, and Urdu names in Arabic script. The default setting for this option is false. (RLPNC-2320)
[en] Removed RNT translators for Dari and the JDEC-Afghanistan transliteration scheme. (RLPNC-2558)
[en] Extended support for applying the IC transliteration standard for Pushto to include the special rules and special cases defined in the Pashto Standardized Transliteration System for Personal Names, 01 June 2011. (RLPNC-2565)
[en] Adjusted MatchScorer
settings for each language to establish a Precision/Recall crossover near the 0.55 threshold for all language pairs. The crossover may still vary, depending on the input data. If you are interested in customizing these settings, please contact support@rosette.com. (RLPNC-2555)
[en] RLPNC support for Java 1.5 is deprecated. RLPNC users of Java 1.5 should move to Java 1.6. In this release, Java 1.6 is required to use The Rosette Web Services for Names, and to use RNI with Solr 3.x or 4.x. (RLPNC-2486)
[en] Fixed a runtime error that occurred when translating Japanese Katakana names. (RLPNC-2435)
[en] Added missing stop pattern and token override files to improve accuracy of Japanese RNI queries. (RLPNC-2477)
[en] New Features and Bug Fixes in 4.3.0
[en] Upgraded the keys stored with Names. Accordingly you must recreate any existing RNI indexes that you intend to use with this release.
[en] Enhanced support for translating Japanese Kanji names and foreign names in Katakana.
[en] Users can now compile and run the Solr sample (RNIinSolr14Sample)
without providing the path to an Apache-Solr-1.4 distribution. The required .jar files have been placed in the samples lib directory. (RLPNC-2113)
[en] If an override file for RNT translations includes source names with multiple target names, and the file does not include confidence scores, RNT sets the confidence score for each translation to 1 divided by the number of translations for that name. (RLPNC-2122)
[en] To discourage performance degradation processing oversized names (probably bad data), the Name object issues a warning if you exceed 10 tokens for the data in a Name. You can use the static Name.setMaximumNumTokens(int maxTokens)
method to change or eliminate the limit. RNICLI
and RNTCLI
now include a -maxTokens
parameter. (RLPNC-2124, RLPNC-2204)
[en] Extended support for stop regular-expression patterns to apply to all supported languages, for fullname overrides to apply to all supported text domain pairs, and for token overrides to apply to English, Japanese, Chinese, and Russian. (RLPNC-1922, RLPNC-2152, RLPNC-2085)
[en] Enabled multi-threaded RNI update operations. Multiple threads may share an INameIndexSession
object. The write operation for each update is handled in a single thread, but other portions of an RNI update, such as the name completion required for adding a name, are multi-threadable. (RLPNC-1957)
[en] To run the RNI and RNT command-line interfaces or examples, you no longer need to set the LD_LIBRARY_PATH
or DYLD_LIBRARY_PATH
environment variable on Linux, Solaris, and Mac OS X platforms. (RLPNC-2158)
[en] Incorporated the NameTranslation web service into the RNI-RNT SDK. Other Rosette Web Services for Names will be added in future releases. (RLPNC-1577)
[en] New Features and Bug Fixes in 4.2.0
[en] This release requires RLP 7.4.
[en] Documented RNI support for running in a Solr application. (RLPNC-1918)
[en] Added gender as a consideration for matching English names. (RLPNC-1851)
[en] RLPNC is no longer built for the following platforms: sparc-solaris9-cc58
and sparc-solaris9-cc58-64
. (RLPNC-1950)
[en] Added support for translating Japanese Kanji names to English and for segmenting Kanji names. (RLPNC-1947)
[en] Added preliminary support for indexing, querying, and matching names in Russian and Japanese Kanji. (RLPNC-2003)
[en] Refined support for English LOCATION, ORGANIZATION, and VEHICLE entity types. Accordingly, the results when processing these types may differ from the results processing PERSON entities or Name objects with no entity. For most accurate results, specify the entity type when defining a Name. (RLPNC-1941)
[en] Added -maxToCheck
parameter to RNICLI to specify how many potential candidates the query should check with its high-precision linguistic filter. Use this parameter to adjust the speed/accuracy tradeoff. Also deprecated the -top
parameter for specifying the maximum number of results the query should return. Use -max
. (RLPNC-2032)
[en] Optimized handling of duplicate candidate names in the RNI index when running the com.basistech.rni.index.StandardNameIndexFilter query()
method. (RLPNC-1656)
[en] Added support for defining Name objects with data fields for indexing and querying. With this facility, you can potentially enhance the accuracy of queries with English, Japanese, and Chinese PERSON names. Scores are higher when a field in the query is similar to the same field (as determined by the order in which the fields appear) in a candidate index name. Fields have no explicit semantic definition (such as family name or given name). When translating a name with fields, RNT handles the name as a single string with a space between each field. (RLPNC-1864)
[en] Modifed the samples ant script to support building and running the Solr Connector sample, provided you have access to Solr 1.4. (RLPNC-2068)
[en] To improve error handling with interactive translations, RNT now uses com.basistech.rnt.assistant.InitialInput
to throw a com.basistech.rnt.UnsupportedBasicTranslatorException
if the input or output domain is not supported, or a com.basistech.rnt.InvalidNameContentException
if the input string is empty or not in the correct input script. (RLPNC-1955)
[en] In response to customer feedback, modified the content of the stop patterns file and token override file for English. (RLPNC-2075)
[en] New Features and Bug Fixes in 4.1.0
[en] Refined the Dari transliteration maps for vocalization to be in line with and as complete as the Pushto transliteration maps. (RLPNC-1774)
[en] In StandardNameIndex
deprecated generateHighRecallKeys
in favor of two new methods: generateHighRecallIndexKeys
and generateHighRecallQueryKeys
. (RLPNC-1595)
[en] Improved accuracy of RNI queries as measured with test data. (RLPNC-1766)
[en] Fixed bugs and improved the accuracy of handling of variable segmentation in RNI's English-to-English matching. (RLPNC-1692)
[en] Added support for designating entity-type-specific overrides for RNI stop patterns, name pair matches, and token pair matches. (RLPNC-1780)
[en] Optimized handling of duplicate names returned by an RNI query. (RLPNC-1656)
[en] Extended out-of-the-box token overrides file for English-to-English matching with cognate or "cousin" name pairs, such as Pierre and Pedro. (RLPNC-1681)
[en] New Features and Bug Fixes in 4.0.2
[en] Fixed error generating Arabic sun letters. (RLPNC-1715)
[en] Improved performance of RNTCLI and fixed a bug in its invocation from the Ant build.xml script. (RLPNC-1711)
[en] Revised implementation of the IC standard for Western Farsi transliteration as detailed in the footnote in the Appendix: Supported Translation Domains.
[en] New Features and Bug Fixes in 4.0.1
[en] Enabled statistical inference for adding diacritization to Dari, Pushto, and Urdu native names not found in the corresponding dictionary. (RLPNC-1622)
[en] Introduced caching of RNI name scores to enhance performance when querying English names against large English language name sets. (RLPNC-1677)
[en] Enabled RNT to return multiple choices for multiple-token input when translating Arabic names. (RLPNC-1650)
[en] Enabled RNTCLI to begin writing output while processing is still in progress. (RLPNC-1648)
[en] Fixed a race condition that sometimes occurred in RNICLI or RNTCLI when processing with multiple threads. (RLPNC-1671)
[en] Fixed an RNT bug handling the complete range of Arabic characters and processing names in Arabic script that end with a Latin character. (RLPNC-1673)
[en] Fixed errors and a potential crash in the IC transliteration of Western Farsi. (RLPNC-1665, RLPNC-1657, RLPNC-1658)
[en] Fixed memory leaks in Arabic name normalization, in the translation of Russian names, and in the procedure for inferring language when not specified by the user. (RLPNC-1661, RLPNC-1663, RLPNC-1660)
[en] New Features and Bug Fixes in 4.0.0
[en] Added constructor for creating a Name
object with just a String argument (script and language are inferred). You can add the Name
to an index, use it in a query, and use it in name matching. Accordingly, RNICLI can now process Name
objects that are constructed solely from a String. To translate the Name
, you must still supply a target transliteration scheme. (RLPNC-887)
[en] Added statistical support for translating foreign (non-Russian) names from Russian to English. (RLPNC-1558)_
[en] Improved recall in RNI first pass key search to do a better job of presenting all potential similarity matches to the linguistics filter. (RLPNC-1459)
[en] Modified the RNICLI utility to enable the addition of names to an existing index, as well as the creation of (and optionally adding names to) an index. (RLPNC-1574)
[en] Enhanced ability to return a reasonable translation to English, rather than no translation, for unknown foreign names, provided the user supplies a sufficiently low translation threshold. (RLPNC-1581)
[en] Enabled statistical inference for adding diacritization to Persian, Pushto, and Urdu native names not found in the Persian, Pushto, or Urdu dictionary. (RLPNC-1597, RLPNC-1622)
[en] Revised the implementation of IC transliteration for Persian to conform to the IC standard for those languages. (RLPNC-64)
[en] Enabled normalization of Arabic native names. (RLPNC-1412)
[en] Renamed the Rosette Name Indexer packages: com.basistech.rni
contains the RNICLI
command-line interface. The Rosette Name Indexer API, including the utility for loading the contents of an XML gazetteer into an RNI index, is in com.basistech.rni.index
and the name matching API is in com.basistech.rni.match
. As a result, the com.basistech.rnm
packages no longer exist. (RLPNC-37)
-
[en] For consistency and clarity, replaced "lookup" with "query" and "NameLookupKey" with "PhoneticHighRecallKey" in the RNI API com.basistech.rnm.index
(RLPNC-1538):
[en] Removed the com.basistech.rlp.pipeline.name
package and the associated sample (NamePipelineSample
). (RLPNC-1522)
[en] Deleted com.basistech.rnt.SimpleTranslatable
. Use com.basistech.rni.match.Name
as the implementation of the com.basistech.rnt.ITranslatable
interface. (RLPNC-14927)
[en] New Features and Bug Fixes in 3.4.0
[en] Removed support for the C++ API for name translation. The corresponding sample application and API documentation have also been removed. This change results in a public API that is entirely in Java. (RLPNC-1409)
[en] Added support for using regular expressions in a "stop-words" file to exclude specified strings from indexing and queries. (RLPNC-1486)_
[en] Added support for using fullname files to specify the similarity scores to assign to specified name pairs. (RLPNC-1455)
[en] Added support for using token files with pairs of name elements. When RNI evaluates two names, each of which contains a token from a pair in the tokens file, it enhances the similarity score for the two names. (RLPNC-1457)
[en] Improved Arabic to Arabic name matching. (RLPNC-1447)_
[en] Refined the matching algorithm to guarantee that matches for a given pair of names are commutative. For two names (a
and b
), the similarity score is identical, whether a
is in the index and b
is in the query, or b
is in the index and a
is in the query. (RLPNC-1446)
[en] Improved the handling of variable segmentation in English to English PERSON name matching, such that the match between two names that differ only by the presence or absence of a space (such as Van Dick and VanDick) receives a higher score than in prior releases. (RLPNC-1430)
[en] Added support for handling titles and initials in Arabic to English and English to Arabic translations of PERSON names. (RLPNC-68, 92)
[en] Added support for standardizing names of Arabic origin in Latin script, according to the transliteration standard specified in the target domain. If the name is not of Arabic origin, it is unchanged. (RLPNC-1370)
[en] To better incorporate the interactive translation assistant into RNT, the com.basistech.xa
package has been deprecated in favor of a new package: com.basistech.rnt.assitant
. The TranslationAssistant
class in this package replaces TransliterationAssistant
in the deprecated package. (RLPNC-1377)
-
[en] Deprecation of some empty constructors, associated set methods, and name normalization without a Name
object. (RLPNC-1497, RLPNC-1476)
[en] In com.basistech.rnm.Name
, the empty constructor is deprecated in favor of public Name(String data, LanguageCode language, ISO15924 script).
[en] In com.basistech.rnm.Transliteration
, the empty constructor, setScheme
, setScript
, and setTransliteration
are deprecated in favor of public Transliteration(TransliterationScheme scheme, ISO15924 script, String transliteration)
.
[en] In com.basistech.rnm.index.NameIndexQuery
, the empty constructor is deprecated in favor of public NameIndexQuery(Name qname)
[en] In com.basistech.rnm.index.NameStringNormalizer
, normalize
is deprecated in favor of public static String normalize(Name n)
.
[en] Plugged a memory leak that sometimes occurred processing Pushto or Dari names. (RLPNC-1460)
[en] All RNI indexes should be rebuilt using this release.
[en] New Features in 3.3.1
[en] New Features in 3.3.0
[en] New Features in 3.2.1
-
[en] TranslationAssistant, data transfer objects are now serializable and com.basistech.xa.TranslationAssistant
includes a new select
method to accommodate the processing of serialized data:
public Output select(int alternativeIndex, InitialInput initial, Segmentation segmentation)
[en] Renamed a few methods on com.basistech.xa.Segmentation
(with deprecation).
[en] Plugged some holes in the Urdu transliteration schemes. (RLPNC-1359)
[en] Fixed a bug with RNT's command line interface. (RLPNC-1360)
[en] New Features in 3.2.0
[en] Upgraded from Apache Lucene 2.1.0 to 2.9.1. Accordingly, you must recreate any exiting RNI indexes that you intend to use with this release.
-
[en] Added RNI support for local and distributed transactions. If you use the INameIndex
interface to perform updates and queries, each operation is automatically executed in its own transaction. To explicitly control local transactions, use the INameIndexSession
interface; for distributed transactions, use the INameIndexTransaction
interface. As a consequence of this change, the APIs for setting and getting batch mode, and flushing updates to an index have been removed, and the INameIndez open()
method no longer includes a boolean argument specifying whether the index is to be opened for updates.
[en] Added two Java samples. AddNamesSample
illustrates the use of a transaction to add a number of names to an RNI index. DistributedTransactionSample
illustrates a distributed transaction with a two-phase commit involving two RNI indexes.
[en] Removed support for the C++ API for name matching. The corresponding sample application and API documentation have also been removed.
[en] Removed the com.basistech.rnm.index.adv
package from the accessible API. The public API for RNI indexes is in the com.basistech.rnm.index
package
[en] Extended the TranslationAssistant (RNT interactive mode) Java API to handle Dari and Pushto names in addition to Arabic names. Refactored the API to simplify access to TranslationAssistant functionality. The infrastructure is now in place to support variable segmentation of the input string, but for this release, TranslationAssistant does not yet support overlapping segments.
[en] Added support for retrieving a group of names (even all names in an RNI index) that share some common characteristic other than name similarity.
[en] Changed the means of specifying multi-threaded behavior for the RNI command-line interface.
[en] Fixed a thread-safety problem performing RNT translations in multiple threads. (RLPNC-2551)
[en] Fixed the occasional omission of leading portion of a Persian, Pushto, or Urdu name in Arabic script during RNT translation. (RLPNC-295)
[en] Significantly sped up the operation of creating or opening an RNI index. (RLPNC-1288)
[en] Fixed a failure in some cases to generate output with long foreign names during RNT translation from Arabic to English. (RLPNC-1297)
[en] New Features in 3.1.0
[en] Significant speed improvements in RNI queries. (RLPNC-1081)
[en] Greater separation between scores for good and bad matches when matching names in Chinese Han characters to names in Latin script. (RLPNC-871)
[en] Updated the dictionary of names that appear in English. (RLPNC-113)
[en] RNI queries now return name data results with scores greater than or equal to the value set with setNameDataMinimumMatchScore
. Prior to this adjustment results had to be greater than the minimum match score to be returned. The minimum match score must be greater than 0.
[en] The RNI command-line interface includes an optional parameter for minimum match score and an optional parameter for maximum number of names to return.
[en] The RNI and RNT command-line interfaces support the concurrent processing of multiple files in separate threads.
[en] Added support for segmenting (determining the boundary between) Korean surnames and given names. (RLPNC-338-, RLPNC-1069)
[en] Updated documentation to use ISO639-3 three-letter language codes, rather than ISO639-1 two-letter language codes. The ISO639-3 codes enable finer language distinctions, such as between Western Farsi ('pes') and Dari ('prs'), both members of the Persian macrolanguage, for which the ISO639-3 code is 'fas' and the ISO639-1 code is 'fa'.
[en] Fixed the capitalization of Latin-script translations of Japanese names. (RLPNC-1217)
[en] Added preliminary RNI and name-matching support for excluding titles from names during English-language queries and name matches.
[en] New Featues in 3.0.1
[en] Activated support for handling nicknames in RNI queries and name matches.
[en] Tuned support for handling initials in RNI queries and name matches
[en] Added support for the Windows 64-bit platform and the Linux IA32 glibc23 gcc40 platform.
[en] New Features and Bug Fixes in 3.0.0
[en] Expanded support for performing RNI queries and name matches with names in the English language. In addition to handling matches that involve phonetic/orthographic differences and missing name components, queries and name matches are designed to handle names with initials, nicknames, and word order variations.
[en] Added preliminary RNI and name-matching support for handling names in the Japanese language rendered with the Hiranaga and Katakana scripts. At the current level of support , queries and name matches involving Hiragana and Katakana are most effective when the names have been segmented (the words that make up each name are space delimited).
[en] New transliteration schemes for the translation of Arabic names in Arabic script to Latin script: Extended IC (handles all characters in Arabic script, including characters used in Persian, Urdu, Pushto, and Dari) and undiacritized BGN (removes diacritics or non-ASCII characters from BGN transliterations).
[en] New transliteration schemes for the translation of Korean names in Hangul and Hanja to Latin script: the Revised Romanization of Korean (MOCT) and the Korean Romanization for Data Applications (KORDA).
[en] Added support for translating Russian names in Cyrillic to Latin script in accordance with the ISO 9:1995 transliteration standard.
[en] For RNI queries, changed the default setting of testNameData
to true, so that users are not required to make this setting for performing name-match queries.
[en] Added a sample RNI Index, OFAC_PERSONS, which contains the names in Latin script of individuals in the Office of Foreign Asset Control watch list. (RLPNC-994)
[en] More robust handling of input names that contain non-alphabetic characters. (RLPNC-968)
[en] Removed the restriction on the number of names an RNI query can return. (RLPNC-909)
[en] Improved the performance of RNI queries for Arabic names. (RLPNC-924)
[en] Changed the default setting for NameIndexQuery.testNameData
from false
to true
in order to simplify the process of issuing standard RNI queries. (RLPNC-960)
[en] Return 0 rather than throw an exception when querying with a Chinese Hani name that contains data in another script. (RLPNC-970)
[en] Modified the statistical model for translating Arabic names, leading to minor differences in the results returned. (RLPNC-974)
[en] New Features and Bug Fixes in 2.2.0
[en] Addition of TranslationAsistant (RNT in interactive mode), a Java API for building interactive applications to transliterate Arabic names.
[en] Clarification of the pairing of language/script combinations for RNI index queries and name matching. Language is language of use in which a name appears, not the language of origin. So, for example, the language of a name of Arabic origin in Latin script, is English, not Arabic. When adding names to an RNI index, performing index queries, or matching names, the script must be a native script for the language. You may use LanguageCode.UNKNOWN
for the language. A name in any legal language/script pairing may match a name in any legal pairing for that language or English (or unknown) in Latin script, and vice versa. For the details, see the appendix titled Supported Text Domains for Rosette Name Indexer and Name Matching. (RLPNC-819)
[en] Fixed memory leak triggered by repeated opening and closing of an RNI index. (RLPNC-917)
[en] New Features and Bug Fixes in 2.1.3
[en] New Features and Bug Fixes in 2.1.2
[en] Removed support for the Windows 32-bit Visual Studio 7.1 platform. For Windows, use the Windows 32-bit Visual Studio 8.0 release.
[en] RNI-RNT now throws an UnsupportedNameDomainException
when an application attempts to index or lookup a name with a script/language combination that is not in a supported text domain. (RLPNC-533)
[en] Fixed a multithreading problem that appeared sporadically in RNT and RNI. (RLPNC-559)
[en] Improved Latin to Latin name matching to score partial matches for individual words in each name. (RLPNC-355)
[en] Provided higher recall in RNI queries at the cost of speed. In a future release, we plan to provide users with control over the tradeoff between speed and accuracy.
[en] New Features and Bug Fixes in 2.1.1
[en] Improved handling of Korean names when an eumjeol in the name contains 2 (not 3) jamo.
[en] Improved performance of name index (RNI) queries involving names in Latin script.
[en] Upgraded IndexQuerySample.java and the C++ sample applications to extract input data from a file.
[en] Improved RNI handling of names with whitespace and/or hyphens.
[en] New Features and Bug Fixes in 2.1.0
[en] Enhanced support for translating personal names from Arabic to English.
[en] Support for transliterating Japanese names in Hiragana and Katakana.
[en] Options added to RNT for adjusting the performance tradeoff between speed and precision and for turning off the use of statistical methods to establish information that was not found in a dictionary.
[en] New static method for returning a map of all the text domain pairs suported by RNI and name matching: com.basistech.rnm.index.StandardNameIndex.getMapOfSupportedTextDomainPairs()
. The key for each map entry is a query text domain. The value is a list of reference text domains supported for that query text domain.
[en] New Features in 2.0.0
[en] RNI-RNT 2.0.0 upgrades the Rosette Name Translator (RNT) 1.1.0 and introduces the Rosette Name Indexer(RNI). The Rosette Name Matcher (RNM), with its XML data model, is deprecated. RNI supports the scoring of name matches between a candidate name and target name.
[en] Release 2.0.0 introduces the following features:
[en] RNI name indexes.
[en] Utility for loading the contents of XML gazetteers into RNI indexes.
[en] Translation of non-native personal names in Arabic documents to their standard English form in Latin script.
[en] The Folk transliteration scheme to generate variant names in Latin script from Arabic, Persian, Pushto and Urdu names in Arabic script.
[en] Move from Java 1.4 to Java 1.5.
[en] Renamed the Java com.basistech.NameMatching.*
packages to com.basistech.rnm.*
. For RNI functionality, see com.basistech.rnm.index.*
.
[en] New sample applications to illustrate usage of the RNI-RNT APIs.
[en] Third-Party Components
[en] For a list of third-party components that are used in Basis Technology products, see rlpnc/ThirdPartyLicenses.txt.
[en] Changes in 7.28.1.c61.0
[en] No changes to third party components
[en] Changes in 7.28.0.c61.0
[en] Updated:
[en] Jackson Annotations 2.9.8 (Apache License) from 2.9.6
[en] Jackson Core 2.9.8 (Apache License) from 2.9.6
[en] Jackson Databind 2.9.8 (Apache License) from 2.9.6
[en] Jackson Dataformat YAML 2.9.8 (Apache License) from 2.9.6
[en] Changes in 7.27.1.c60.0
[en] No changes to third party components
[en] Changes in 7.27.0.c60.0
[en] Added:
[en] Changes in 7.26.1.c60.0
[en] No changes to third party components
[en] Changes in 7.26.0.c60.0
[en] No changes to third party components
[en] Changes in 7.25.1.c60.0-solr-7
[en] Added:
[en] Updated:
[en] Changes in 7.25.0.c60.0
[en] Updated:
[en] Changes in 7.24.2.c59.3
[en] No changes to third party components
[en] Changes in 7.24.1.c59.3
[en] No changes to third party components
[en] Changes in 7.24.0.c59.3
[en] Updated:
[en] Jackson Annotations 2.9.6 (Apache License) from 2.9.4
[en] Jackson Core 2.9.6 (Apache License) from 2.9.4
[en] Jackson Databind 2.9.6 (Apache License) from 2.9.4
[en] Jackson Dataformat YAML 2.9.6 (Apache License) from 2.9.4
[en] Changes in 7.23.3.c59.2
[en] No changes to third party components
[en] Changes in 7.23.2.c59.2
[en] No changes to third party components
[en] No changes to third party components
[en] Updated:
[en] Google Guava 18.0.0 (Apache License) from 16.0.1
[en] SnakeYAML 1.18 (Apache License) from 1.15
[en] Dropwizard Metrics Core 3.2.3 (Apache License) from 3.2.2
[en] Updated:
[en] Jackson Annotations 2.9.4 (Apache License) from 2.7.3
[en] Jackson Core 2.9.4 (Apache License) from 2.7.3
[en] Jackson Databind 2.9.4 (Apache License) from 2.7.3
[en] Jackson Dataformat YAML 2.9.4 (Apache License) from 2.7.3
[en] No changes to third party components
[en] No changes to third party components
[en] Updated:
[en] Deleted:
[en] Added:
[en] Updated:
[en] Apache Lucene Core 6.6.0 (Apache License) from 6.0.1, 6.2.1
[en] Apache Lucene Solr 6.6.0 (Apache License) from 6.2.1
[en] Apache Commons FileUpload v1.3.2 (Apache License) from 1.3.1
[en] International Components for Unicode (ICU) ICU4J 59.1 (ICU License) from 55.1
[en] Deleted:
[en] Apache Lucene Solr 4.10.3 (Apache License)
[en] Apache Lucene Core 4.10.3 (Apache License)
[en] Joda time 2.9.3 (Apache License)
[en] Added:
[en] Updated:
[en] Added:
[en] Updated:
[en] Apache Lucene Core 6.0.1 (Apache License) from 5.2.1
[en] International Components for Unicode (ICU) ICU4J 55.1 (ICU License) from 53.1
[en] Jackson Annotations 2.7.3 (Apache License) from 2.6.2
[en] Jackson Core 2.7.3 (Apache License) from 2.6.2
[en] Jackson Databind 2.7.3 (Apache License) from 2.6.2
[en] Jackson Dataformat YAML 2.7.3 (Apache License) from 2.6.2
[en] Apache Commons FileUpload v1.3.1 (Apache License) from 1.2.1
[en] Added:
[en] SnakeYAML 1.15 (Apache License)
[en] Jackson Dataformat YAML 2.6.2 (Apache License)
[en] liblinear-java 1.94 (liblinear-java License)
[en] Apache Commons CLI 1.2 (Apache License)
[en] Updated:
[en] Apache Lucene Core 5.2.0 (Apache License) from 4.3.1
[en] Apache Lucene Solr 5.5.0 (Apache License)
[en] International Components for Unicode (ICU) ICU4J 55.1 (ICU License) from 53.1
[en] Jackson Annotations 2.6.2 (Apache License) from 2.4.4, 2.4.1
[en] Jackson Core 2.6.2 (Apache License) from 2.4.4
[en] Jackson Databind 2.6.2 (Apache License) from 2.4.1
[en] Deleted:
[en] Updated:
[en] Apache Lucene Solr 4.10.3 (Apache License) from 4.9.1
[en] fastutil 6.6.0 (Apache License) from 6.5.9
[en] Google Guava 16.0.1 (Apache License) from 14.0.1
[en] International Components for Unicode (ICU) ICU4J 53.1 (ICU License) from 3.6.1
[en] Jackson Annotations 2.4.1, 2.4.4 (Apache License) from 2.4.0
[en] Jackson Core 2.4.4 (Apache License) from 2.4.0
[en] Added:
[en] iHarder.net Base64 v2.1.0 (MIT)
[en] Noggit v0.5.0 (Apache License)
[en] Restlet v2.1.1 (Apache License)
[en] Apache Commons Lang v2.6.0 (Apache License)
[en] Deleted:
[en] Updated:
[en] Apache Lucene Solr 4.9.1 (Apache License) from 1.4.1, 3.5.0, 4.0.0
[en] Apache Commons IO 2.3.0 (Apache License) from 1.4, 2.1
[en] Apache Commons http Components 4.3.0, 4.3.1 (Apache License) from 4.1.3, 4.1.4
[en] Apache Lucene Core 4.3.1, 4.9.1 (Apache License) from 4.3.0
[en] Apache Zookeeper 3.4.6 (Apache License) from 3.3.6
[en] ehcache Core 2.5.2 (Apache License) from 2.5.0
[en] fastutil 6.5.9 (Apache License) from 6.3, 6.5
[en] Google Guava 14.0.1 (Apache License) from 14.0.0
[en] Jackson Annotations 2.4.0 (Apache License) from 2.1
[en] Jackson Core 2.4.1 (Apache License) from 2.1
[en] Jackson Databind 2.4.1 (Apache License) from 2.1
[en] slf4j 1.6.3 (MIT) from 1.6.1
[en] Spatial4j 0.4.1 (Apache License) from 0.3.0
[en] [1] The C++ compiler is irrelevant in RNI-RNT, which provides a Java public API.
[en] [2] Copyright � 2015 by Elasticsearch BV. Licensed under The Apache License Version 2.0.
[en] [3] The native releases are partially implemented in native code. All releases contains the same public Java API.