Both the schema template and project schema use the same dialog to define and configure the schema.
Once the objects in a section are defined completely, the indicator in the right corner will change to a green check box.
Each schema is identified by a unique name.
Schema template: If you import an existing template, the name will default to the name of the imported schema.
If you are modifying an existing schema template, you can use the same name.
If you are creating a new schema template by importing an existing template, you must change the name.
Project schema: The name will default to the name of the template selected when creating the project. You may want to change the name to reflect the project.
Each schema template has a profile id attached to it. The profile id specifies a custom entity extraction model, allowing entity extractors to use custom entity types. The drop-down list box lists all custom profiles available on Rosette Server.
Any models, gazetteers, and regular expressions used when training a model must also be used when performing event extraction. Use the same custom profile to configure REX for model training and event extraction. The custom profile is set in the schema definition for event model training.
A schema is for one and only one language. Select the language from the drop-down list box.
The first task in defining your event schema is defining the set of event types you want to recognize. When extracting events, you don't extract all possible event types; you only extract the event types of interest. It's important to recognize which sorts of events are significant and will be mentioned frequently in your domain. Consider the set of entities and events that are going to be mentioned in the documents you will be analyzing. The goal is to train a model to extract only the event types that you are meaningful to your operation.
The defined schema can support multiple event types. For example, if you're analyzing travel blogs, you may want to identify airline and hotel events.
Each event type has one or more key phrases, a word in the text that evokes the given event type. Rosette uses key phrases to identify candidate event mentions from the text.
Key phrases are case-insensitive and related words are matched. For example, if the key phrase is fly, it will match fly, flying, and flies. The extractor for a key phrase is defined as a morphological extractor. Words of any case, with the same lemma as the key phrase, are considered a match.
Event mentions also include roles, that is, the people, places, times, and other mentions which add detail to the event mention. For a flying event, with a key phrase of flew, you may want to know who flew. Where did they go? The flyer and destination are roles for the event type. For each role, the role type must be defined.
Select Add Event Type.
Add an event type name. This is the label displayed when annotating data.
Add key phrases.
Add the roles with the corresponding role type.
Enter a name for the role. This is the label displayed when annotating data.
Select a role type from the drop-down menu or add a new role type.
Check the Required box if the role type must exist in an event mention for that event type.
A role can be required or optional. If required, an event mention will not be extracted without the role. You should only mark a role as required if it must always be in the event mention. Let's look at some examples for a flight scenario.
Bob flew from Boston to Los Angeles on Wednesday.
The key phrase and roles are:
Let's assume the destination is marked as required in the schema definition. In this case, only one of the following event mentions will be extracted.
Bob flew to Los Angeles.
Bob's flew from Boston on Wednesday.
The second event mention will not be extracted, since it does not contain the required role, even if it is annotated.
Role types are generic categories, while role mentions are specific instances of those categories. Extractors define the specific rules to extract the role candidates. Extractors are combined into role types.
Role types define the rules that are used to identify a piece of text as a candidate for a specific role or key phrase. A role type is made up of one or more extractors and is reusable.
Multiple extractors can be included in a role type definition. They are combined as a union - all possible candidates extracted are included.
To define a role type and assign extractors:
Each role type must have a unique name.
Select one or more extractors for this role type.
The green check mark indicates all extractors are configured.
Rosette has multiple techniques to identify candidate key phrases and roles in text. For example, it can match a list of words, or it can match all the lemmas for a given word. Using Rosette Entity Extractor (REX) it can identify entity mentions of specific entity types. Extractors define the rules and techniques used to identify role and key phrase candidates in the text.
Once defined, extractors are reusable in multiple schemas. An extractor named location may be defined as the standard REX entity type Location. It could be used in troop_movement events as well as travel events, as each of them have roles involving locations.
The currently supported extractor types are:
Entity: A list of REX entity types. You can use the standard, pre-defined REX entity types or train a custom model to extract other entity types.
Semantic: A list of words or phrases. Any word whose meaning is similar to one of these words will match. For example, an extractor of meeting will match assembly, get-together, gathering, conclave. Rosette uses word vector similarity to identify similar words.
Morphological: A list of words. When a word is added to this list, it is immediately converted to and stored as its lemma. Words with the same lemmatization will match. For example, a morphological extractor for go will match going, went, goes, gone.
Exact: a list of words or phrases. Exact will match any words on the list, whether they are identified as entity types or not. For example, you could have a list of common modes of transportation, including armored personnel carrier and specific types of tanks.