Pairwise matching in RNI has the benefit of being completely stateless, making it possible to optimize performance through parallelization. You are limited only by the concurrency limit on Rosette Server/Cloud or your client application.
First you must define what you are trying to optimize. Latency? Throughput? In RNI latency is often thought of as “seconds per query” or how long a query takes to execute. Throughput is usually defined as “how long does this list of queries take to run?” Inside RNI, latency and throughput depend on many factors:
Size of the index and configuration of the Elastic cluster
How the query is constructed
windowSize
Query methods implemented
We will examine each of these below.
Index size and Elastic configuration
How many nodes do I need? This is a common question. It is very important to performance in addition to indicating the resources you need to provision. To answer, start by examining the data you intend to index. First identify the fields that will be in the Elastic document. Then identify the fields which are of the RNI field type. Use the Elastic mapping for your index to complete this process. Once calculated, determine the size of each field, the more precise the better. RNI fields generate 4 times as much metadata on indexing. The metadata helps with the recall of candidates to avoid false negatives. Given the total number of records to be stored in the index, you can now calculate the total size of your data. The table below shows an example.
Calculate index data size:
You also need to consider the growth projections of your data. If your data is expected to grow by 600,000 records per year, you will want to calculate those values as well.
Calculate index size growth projections:
A final consideration is backup or resilience within your cluster. Your tolerance defined by your risk and backup policies will determine the number of replicas to consider. Each replica increases the required storage capacity.
You should aim to keep the average shard size between a few GB and a few tens of GB. A general rule of thumb is to have 1 shard and 1 replica per node. This allows for metadata and models to be in memory at the time of query for optimal performance. Continuing our example you would have the following cluster.
*recommendation
Performance expectations for the recommended configuration should be between 500ms and 1.5s depending on RNI configuration options. Actual performance may vary. The previous recommendation is intended to be a starting point. Performance analysis should be conducted on actual hardware and data to confirm.
The query rescorer executes a second query only on the Top-K results returned by the query and post-filter phases. The number of docs which will be examined on each shard can be controlled by the windowSize parameter, which defaults to 10. As mentioned previously, the windowSize set in the first step of the query has a direct impact on the performance of RNI. Setting the windowSize to a small value limits the number of candidates sent to the second pass. Note that the windowSize will be applied to each shard in your index. The chart below illustrates the relationship between performance and windowSize.
The query construction in the first pass plays an important role in overall performance. RNI performance is tied to how many candidates the first pass generates. There is a fundamental difference between a query with a bunch of terms combined using an OR and a query with an AND clause. A query that uses boolean logic containing ORs will fill up the windowSize more quickly than a query that is more selective. The more specific you can make the first pass query, the smaller the number of candidates selected, reducing the time spent in the second pass.
There are essentially two ways to query an index:
Single queries
Batch queries
As the name suggests, single queries are fired off individually while batch queries execute a set of queries together. A query is sent to Elastic and waits for a response to be returned before continuing. Elasticsearch does offer a bulk API that will process multiple operations into a single call, but some of these functions are single threaded which might not improve your throughput. To improve throughput parallelize the client that creates and sends the query, in accordance with its implementation to achieve greater throughput.