The Rosette Server endpoints are configured by the files found in the launcher/config/rosapi
directory. Be careful when editing any of these files as the endpoints will not work if not configured properly.
The morphology-specific parameters and settings are in the file: rbl-factory-config.yaml
. The sentences endpoint uses the same configuration file.
Fragment Boundary Detection
In cases where a document or part of a document contains tables and lists, instead of sentences, the /sentences endpoint can detect fragment boundaries as sentence boundaries. One way fragment boundaries are identified is by encountering fragment delimiters. A delimiter is restricted to one character and the default delimiters are U+0009 (tab), U+000B (vertical tab), and U+000C (form feed).
You can modify the set of recognized delimiters:
Edit the file/config/rosapi/rbl-factor-config.yaml
Remove the comment from the fragmentBoundaryDelimiters
parameter
Edit the parameter values string to contain all values to be recognized as fragment boundaries, including any of the default values you want to keep
In addition to the fragment delimiters, the fragment boundary detector automatically inserts a break:
After 3+ consecutive spaces
After 2 new lines
At the end of the line, when the line has less than 7 tokens
At the end of the line which contains a previous fragment boundary
After every newline in a list. A list is defined as 3 or more lines containing the same punctuation mark within the first 5 characters of the line.
By default, fragment boundary detection is turned on. To turn off fragment boundary detection:
Edit the file/config/rosapi/rbl-factor-config.yaml
Remove the comment from the fragmentBoundaryDetection
parameter
Set fragmentBoundaryDetection: false