Each profile can include custom data sets. For example, the entities endpoint includes multiple types of data files, including regex and gazetteers. These files can be put into their own directory for entities, known as an overlay directory. This is an additional data directory which takes priority over the default entities data directory.
Note
If the data overlay directory is named rex, the contents of the overlay directory will completely replace all supplied REX data files, including models, regex, and gazetteer files.
If your custom data sets are intended to supplement the shipped files, the directory name must not be rex
.
If your custom data sets are intended to completely replace the shipped files, use the directory name rex
.
Example 2. Custom Gazetteer for the Entities Endpoint
We will create a custom gazetteer file called custom_gaz.txt
specifying "John Doe" as an ENGINEER entity type. Full details on how to create custom gazetteer files are in the section Creating a Custom Gazetteer in the Rosette Entity Extractor Application Developer Guide.
-
Create the custom gazetteer file in /Users/rosette-users/group1/custom-rex/data/gazetteer/eng/accept/custom_gaz.txt.
It should consist of just two lines:
ENGINEER
John Doe
Copy the file /config/rosapi/rex-factory-config.yaml
to /Users/rosette-users/group1/config/rosapi/rex-factory-config.yaml
.
-
Edit the new rex-factory-config.yaml
file, setting the dataOverlayDirectory
.
# rootDirectory is the location of the rex root
rootDirectory: ${rex-root}
dataOverlayDirectory: "/Users/rosette-users/group1/custom-rex/data"
-
Call the entities endpoint with the profileId
set to group1
:
curl -s -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Cache-Control: no-cache" \
-d '{"content": "John Doe is employed by Basis Technology", "profileId": "group1"}' \
"http://localhost:8181/rest/v1/entities"
You will see "John Doe" extracted as type ENGINEER from the custom gazetteer.