The Rosette Server endpoints are configured by the files found in the /launcher/config/rosapi
directory. Be careful when editing any of these files as the endpoints will not work if not configured properly.
The morphology-specific parameters and settings are in the file: rbl-factory-config.yaml
. The sentences endpoint uses the same configuration file.
Fragment boundary detection
In cases where a document or part of a document contains tables and lists, instead of sentences, the /sentences endpoint can detect fragment boundaries as sentence boundaries. One way fragment boundaries are identified is by encountering fragment delimiters. A delimiter is restricted to one character and the default delimiters are U+0009 (tab), U+000B (vertical tab), and U+000C (form feed).
You can modify the set of recognized delimiters:
-
Edit the file/launcher/config/rosapi/rbl-factor-config.yaml
-
Remove the comment from the fragmentBoundaryDelimiters
parameter
-
Edit the parameter values string to contain all values to be recognized as fragment boundaries, including any of the default values you want to keep
In addition to the fragment delimiters, the fragment boundary detector automatically inserts a break:
-
After 3+ consecutive spaces
-
After 2 new lines
-
At the end of the line, when the line has less than 7 tokens
-
At the end of the line which contains a previous fragment boundary
-
After every newline in a list. A list is defined as 3 or more lines containing the same punctuation mark within the first 5 characters of the line.
By default, fragment boundary detection is turned on. To turn off fragment boundary detection:
-
Edit the file/launcher/config/rosapi/rbl-factor-config.yaml
-
Remove the comment from the fragmentBoundaryDetection
parameter
-
Set fragmentBoundaryDetection: false