How can I remove specific terms from names like ltd, pty, inc in organization names?

You can use stop regexes to remove specific elements from a name. 

The stop patterns are OR'd together using the alternation operator ("|") and matched against character-normalized inputs. Any matching region is replaced with a space. In the case of removing the "ltd[.]" token, you could use this line in the stopregexes_eng_ORGANIZATION.txt file:
\sltd\.?\s Which would result in turning "foo ltd bar" into "foo bar". Another thing to keep in mind is that since stopregexes*.txt are applied at both index and query time (rather than just query for token and full-name override pairs), a modification to stopregexes*.txt ought be accompanied with re-indexing. Further the JVM must be restarted since stopregexes*.txt is read once at start up.

Learn more about Stop Patterns and Stopwords in the section 6.13 (User-Configurable Features) of the application developer guide.
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.




Powered by Zendesk