Removed support for integrating RNI into a Solr 1.4 Application. (RLPNC-3232)
Added support for making concurrent calls to the RNI web service querying and/or updating the same RNI index. (RLPNC-3397)
Enabled translation from Chinese to English of name containing a bullet (U+2022) in place of the middle dot (U+00B7). The middle dot is used to separate the words in a non-Chinese name. (RLPNC-3395)
Fixed a crash bug that appeared when attempting to translate a single Korean jamo to English. The jamo is now transliterated as the appropriate Latin script letter(s). (RLPNC-3394)
In addition to dictionary lookups, added support to the com.basistech.rnt.assistant
interactive API to provide statistically inferred English alternatives for Arabic, Korean, Russian, and Chinese input, as well as statistically inferred Arabic, Korean, and Russian alternatives for English input When a result is statistically inferred, that information is included in the result (DataSourceType.StatisticallyInferred
). (RLPNC-3283, RLPNC-3309, RLPNC-3063)
Fixed failure to log the exception when the RNI web service cannot open an index. (RLPNC-3389)
Added a DefaultTranslationPairs
class to enable translations by specifying the source and target language rather than complete language domains. (RLPNC-3372)
Added a hintLanguage
property to the Name
object that NameBuilder
can use as a suggestion when it guesses the language. In standard usage, the hint will be the language already identified for the document containing the names. If the hinted language is compatible with the script, which NameBuilder
can also guess, NameBuilder
returns the hinted language, otherwise the language it guessed. This feature is also available through the NameTranslation web service. (RLPNC-3369, RLPNC-3311))
The RWS-Names web service was sometimes returning UNKNOWN as the language for a name. Now it always guesses a language if the user did not supply it. (RLPNC-3316)
Make the LanguageOfOrigin
results returned as part of a translation available through the NameTranslation web service ResultAnnotations
object. (RLPNC-3352)
Translations from Russian to English now include language of origin (Russian or English) for each word in each result. (RLPNC-3352
Optimized the script guessing algorithm, approximately doubling the speed. (RLPNC-3314)
Fixed the Solr plugins to include language of origin when adding names to an index. (RLPNC-3305)
Enhanced RNI scoring of name matches when the names are the same except for reordering of regions of the name (such as Carnera Baer Braddock Louis Charles Walcott and Charles Walcott, Carnera Baer Braddock Louis). (RLPNC-3304)
Normalized l with stroke (U+142) for better RNI handling of Polish names (such as Michał, which now matches Michal). (RLPNC-3303)
Corrected exception thrown when user attempts to transliterate a particular person name (اسامة نين اع بن لادن) from Arabic to English. (RLPNC-3301)
Extended the RWS-Names web interface to support pairwise name matching and added the RNI Pairwise Matcher Demo to illustrate this feature. (RLPNC-3012)
Modified the RNI Demo to enable users to specify language of origin and entity type for a query name. (RLPNC-3295, RLPNC-3104)
Established a static method (RNIConfiguration.setLicenseXML(String licenseXML)
) for passing in the RNI user license, matching the behavior for RNT (RNTEnvironment.setLicenseXML(String licenseXML)
. (RLPNC-3344)
To improve accuracy, trained an RNI Russian language model and a Korean language model for determining the frequency of person name tokens. (RLPNC-3289, RLPNC-3241)
Fixed an error handling the Persian ezafe (-e) during IC transliterations. (RLPNC-3273)
Added ability to extract debug info from each of the MatchResult
objects returned by a query. In NameIndexQuery
call setIncludeDebugInfo(true)
, then in each MatchResult
, call getDebugInfo()
to return a string that describes how the result was derived. (RLPNC-3266)
Added a .bat file (Windows) and shell script (Unix) for running RNICLI from the command line without using Ant. (RLPNC-3203)
Increased accuracy of Korean English matching of native names by including more Korean name data in training. (RLPNC-3341)
Added "corp" to our eng_eng_ORGANIZATION token override file so RNI returns a better score when matching an organization name like Korea National Oil Corporation with Korea National Oil Corp. (RLPNC-3340)
Adjusted the phonetic algorithm that RNI uses for generating search keys for Korean names so that 김 provides a better match for Kim. (RLPNC-3334)
Provided more complete normalization of extended Latin characters, such as đ with stroke (U+0111), to improve accuracy handling names with such characters. (RLPNC-3322)
Fixed support for specifing the Korean script Kore
, an alias for Hangul + Han. (RLPNC-3318)
Fixed an RNI threading issue that was causing crashes during initialization. (RLPNC-3315)
Added support for performing the reverse transliteration of Korean names in English using the BGN scheme. (RLPNC-3281)
Enhanced ISO15924Utils.scriptForString
to know to return a macro-script (Hrkt or Jpan) for mixed script Japanese strings. Hrkt is a mixture of Katakana (Kana) and Hiragana (Hira) (e.g., トイザらス). Jpan is a mixture of Kana, Hira, and Kanji (Hani) (e.g., トヨタ自動車株式会社). (RLPNC-3279)
Fixed the accuracy of RNI span matches when matching a hyphenated name in English against a name in Korean. (RLPNC-3277)
NameBuilder
now uses an Unchecked Exception in place of InvalidNameException
, so users do not have to use a try/catch every time they create a name. (RLPNC-3276)
Added guessLanguage
and guessScript
utility methods to NameBuilder
, so users can determine language and script without having to create the Name
object. (RLPNC-3275)
Added an English to Russian (Latn, eng, folk to Cyrl, rus, native) translator. (RLPNC-3269)
Added an RNT option to specify which transliteration scheme should be used for BGN when transliterating Korean. If com.basistech.rnt.options.KorGeographyOption
is set to NORTHKOREAN
(the default), RNT uses McCune-Reischauer . If set to SOUTHKOREAN
, RNT uses Revised Romanization of Korean. (RLPNC-3261)
Added support for transliterating Korean into Undiacritized BGN. (RLPNC-3253)
Improved ability of RNI to match truncated names. (RLPNC-3200)
The RNT Japanese translator now use Korean segmentation when given Hani name with Korean language of origin. (RLPNC-3173)
Extended the use of the RNI simililarty score of 1.0, meaning exact match, to indicate the strings are equal, the languages of use and origin match, and the entity types match. (RLPNC-2636)
System.(err|out).print and printStackTrace() statements in RNI and RNT have been replaced with slf4j (logging) calls. (RLPNC-2688)