Interface | Description |
---|---|
Annotator |
An
Annotator annotates text with attributes. |
Class | Description |
---|---|
AnnotatedText |
The root of the data model.
|
AnnotatedText.Builder |
Builder class for
AnnotatedText objects. |
Attribute |
Base class for attributes that span a range of text.
|
Attribute.Builder<T extends Attribute,B extends Attribute.Builder<T,B>> |
Base class for builders for attributes that inherit from
Attribute . |
BaseAttribute |
Base class for attributes that annotate text.
|
BaseAttribute.Builder<T extends BaseAttribute,B extends BaseAttribute.Builder<T,B>> |
Base class for builders for the subclasses of
BaseAttribute . |
LanguageDetection |
The results of running language detection on a region of text.
|
LanguageDetection.Builder |
A builder for language detection results.
|
LanguageDetection.DetectionResult |
A single result from language detection.
|
LanguageDetection.DetectionResult.Builder |
Builder for detection results.
|
ListAttribute<Item extends BaseAttribute> |
A container for an ordered collection of attributes of a type.
|
ListAttribute.Builder<Item extends BaseAttribute> |
A builder for lists.
|
RawData |
A container for incoming raw data (bytes).
|
ScriptRegion |
A script region.
|
ScriptRegion.Builder |
Builder for script regions.
|
The root of the model is the AnnotatedText
class.
The annotations are represented as objects that inherit from BaseAttribute
.
The base attribute is the simplest attribute; all this class provides is a map of extended properties
that are used, as described below, as an extensibility mechanism.
Most attribute classes inherit from Attribute
. This class adds a start offset and an end offset.
Thus, attributes that refer to the AnnotatedText
as a whole inherit from BaseAttribute
, while attributes
that refer to subsequences of text inherit from Attribute
.
In some cases, applications of this data model may also need to represent initial raw data.
The RawData
class supports that usage. RawData
stores a ByteBuffer
and a Map<String, <List<String>>
of
metadata. There is no connection in the code between AnnotatedText
and RawData
.
All of the classes in this package are immutable. If a program needs to modify, it needs to construct new classes.
This 'functional' approach avoids any possibility of concurrent access problems. Creating a new AnnotatedText
over all the attributes of an old AnnotatedText
plus a new set is not particularly costly compared to whatever
actual NLP task is producing the annotations.
Because these classes are immutable, they have many arguments to their constructors. Each class has a
nested Builder
class to avoid this inconvenience; the constructors are thus not public.
We could have designed this data model to defer all the binding until runtime -- essentially, a giant collection of maps and arrays. This would have allowed any program at any time to define a new annotation, and would have made it very difficult to encounter a version skew amongst libraries compiled to different versions of the model. Programming to that sort of data model is painful, so we chose to write specific classes for specific annotations.
To mitigate the possible unpleasant consequences resulting from version skew, this model includes an extensibility
mechanism. BaseAttribute
contains a Map<String, Object>
. This allows programs
that have differing sets of annotations to communicate via Json. The JsonAnySetter
and JsonAnyGetter
annotations cause any items in the Json object to be mapped to
entries in the map. Entries in the map are serialized as keys in the object. Thus, a program can read in a
serialized AnnotatedText
that contains attributes with fields that it does not know about.
All of the classes in here support json serialization and deserialization via Jackson 2.4.x. However, they require some customization to get a correct and efficient representation. This customization is provided in a separate module: adm-json.
Copyright © 2016 Basis Technology Corporation. All Rights Reserved.