RNI enables high-speed, scalable searches for addresses in English using the Apache Lucene full-text search engine to store addresses with their search keys and a key index.
When you search for an address, RNI generates a search key for each component of each address field, locates all addresses indexed by those search keys, and uses linguistic matching algorithms to filter that set of addresses down to the most similar addresses.
RNI provides a Java API that you can use to embed it in your applications.
Java packages: The address indexing classes are in com.basistech.rni.index.internal
. Unqualified class names that appear in this section are in com.basistech.rni.index.internal
.
For detailed information about the API, see the Java API Reference.
Constructing an Address Index
An address index is an indexed list of addresses. The list includes a collection of AddressSpec
objects and associated keys.
The AddressSpec
object may include house, house number, road, unit, level, staircase, entrance, suburb, city district, city, island, state district, state, country region, country, world region, post code, post office box and additional fields.
Note
You can also create an index in memory that is never stored on disk.
To create an indexed list of addresses on disk, you must specify a pathname for the data store.
For example:
// Create an Address index.
// indexPathname specifies the directory where the index will be created.
StandardAddressIndex createIndex(String indexPathname) throws NameIndexStoreException,
RNTException {
StandardAddressIndex index = StandardAddressIndex.create(indexPathname);
return index;
}
Now you can use AddressSpecBuilder
to create AddressSpec
objects and add them to the index. AddressSpecBuilder
provides a fluent interface that supports method chaining.
You can also create an AddressSpec
object by parsing an address using AddressSpecBuilder.parse(String str)
which internally utilizes the jpostal library. The following fragment illustrates the syntax for creating and adding an AddressSpec
to the index.
// Add an address to the index.
void addAddress(StandardAddressIndex index, Integer id) throws NameIndexException, IOException {
// Give the address a unique identifier. Must be a string.
String uid = Integer.toString(id);
// AddressSpecBuilder provides methods for adding address fields,
// and a build method that returns the AddressSpec.
AddressSpec addr = new AddressSpecBuilder()
.house("101")
.road("Stuart Street")
.city("Boston")
.state("MA")
.countryRegion("New England")
.uid(uid)
.build();
// AddressSpecBuilder also provides a method for parsing addresses which uses jpostal,
// and a build method that returns the AddressSpec.
AddressSpec addr2 = AddressSpecBuilder.parse("101 Stuart Street, Boston, MA").build();
index.addAddress(addr);
index.close();}
When you are done adding addresses, be sure to close the address index, as in the preceding fragment.
Querying an Address Index
You can define and run queries that search an index for similar addresses.
The primary role of an address index is to perform queries. You can also perform updates (insertions and deletions).
StandardAddressIndex
provides a static
method for opening an address index.
StandardAddressIndex index = StandardAddressIndex.open(String indexPathname);
indexPathname
is the path to the directory that contains the address index.
To optimize the index for more efficient queries, call
index.optimize();
When you are done using the address index, you must close it:
index.close();
Defining an Address Search Query
A query includes an AddressSpec
object and several settings that you can use to constrain the query.
Set up an AddressIndexQuery
object . For example:
// Define a query.
AddressIndexQuery defineQuery(AddressSpec address){
AddressIndexQuery query = new AddressIndexQuery(address);
query.setAddressDataMinimumMatchScore(.30);
return query;
}
Query Performance Tradeoffs
You can make tradeoffs between different dimensions of performance by adjusting certain AddressIndexQuery
parameters.
For more information about tradeoffs between accuracy and speed and between false positives and false negatives, refer to Query Performance Tradeoffs for names. For addresses, you will adjust the addressesToCheckAllowance
and maximumAddressesToCheck
AddressIndexQuery parameters.
Running the Query and Accessing the Query Results
StandardAddressIndex
includes a query
method that takes as its parameter the AddressIndexQuery
you have set up.
The query returns an AddressIndexQueryResult
list. Each AddressIndexQueryResult
object provides an AddressSpec
object and a similarity score. As the following fragment illustrates, you can obtain and process each AddressSpec and its score. The higher the score (greater than 0 and less than or equal to 1), the greater the confidence that this is a relevant match. A score of 1.0 indicates that the query address and result address are identical. See Address Variations. Scoring is commutative: the scores for two given addresses are always the same, regardless of which address is in the index and which address is in the query.
https://raw.githubusercontent.com/basis-technology-corp/rosette-sample-code/master/rni-rnt/address_query_index.java
AddressMatchResult. The AddressIndexQueryResult
provides an AddressMatchResult
object, which in turn provides a match type and score.
When you are done running queries, close the index:
index.close();
For a sample Java application that defines a Rosette Address Indexer query, runs the query, and reports the results, see AddressIndexQuerySample.
No more than one StandardAddressIndex
object may exist for a given address index on disk at any time.
Queries and updates may be performed in multiple threads on a single StandardAddressIndex
object.