Important
The event schema is created before starting annotation and should be carefully designed by a model architect. Refer to the Developing Models with Rosette guide for a detailed explanation of events and how to design your schema for event extraction.
An event model is trained to extract specific types of event mentions. Before starting any type of event recognition project, you must identify the types of event mentions you want to extract and then define the structure or schema of each event type. Plan on spending a good amount of time and effort defining your event types and the schema for each type before beginning a project.
The first task in defining your event schema is defining the set of event types you want to recognize. When extracting events, you don't extract all possible event types; you only extract the event types of interest. It's important to recognize which sorts of events are significant and will be mentioned frequently in your domain. Consider the set of entities and events that are going to be mentioned in the documents you will be analyzing. The goal is to train a model to extract only the event types that you are meaningful to your operation.
The defined schema can support multiple event types. For example, if you're analyzing travel blogs, you may want to identify airline and hotel events.
Each event type has one or more key phrases, a word in the text that evokes the given event type. Rosette uses key phrases to identify candidate event mentions from the text.
Key phrases are case-insensitive and related words are matched. For example, if the key phrase is fly, it will match fly, flying, and flies. The extractor for a key phrase is defined as a morphological extractor. Words of any case, with the same lemma as the key phrase, are considered a match.
Event mentions also include roles, that is, the people, places, times, and other mentions which add detail to the event mention. For a flying event, with a key phrase of flew, you may want to know who flew. Where did they go? The flyer and destination are roles for the event type. For each role, the role type must be defined.
-
Select Add Event Type.
-
Add an event type name. This is the label displayed when annotating data.
-
Add key phrases.
-
Add the roles with the corresponding role type.
-
Enter a name for the role. This is the label displayed when annotating data.
-
Select a role type from the drop-down menu or add a new role type.
-
Check the Required box if the role type must exist in an event mention for that event type.
Each event type has one or more key phrases, a word in the text that evokes the given event type. Rosette uses key phrases to identify candidate event mentions from the text.
Let's consider the troop_movement example. If you were reading a document, what words would you look for to indicate that it was discussing troop movements? Drove, flew, took off, landed, arrived, moved are all potential key phrases.
Looking at the keyword flew in more detail, what about other tenses of the word flew? Words like flying, fly, flies. You don't want to have to list every possible version of the word.
Rosette identifies candidate keywords in your documents by using an extractor. For key phrases, the extractor will look for the exact words or the lemmas of the words you define.
Rosette uses the candidate key phrase to identify event types in the text.
Event mentions include more objects than the key phrase. These other objects are usually entity mentions, i.e. people, places, times, and other mentions which add detail to the key phrase. For a flying event, with a key phrase of flew, you may want to know who flew? Where did they go? When did they go? What kind of aircraft did they fly on? The people, locations, times, and aircraft are all entity mentions that have roles.
Roles detail how the entity mention relates to the event. They answer the questions: What does this entity do in the event? What role does it play?
Let's look at a troop_movement event. What types of entities might we expect to find? What types of roles? Some possible roles include:
-
Mover: the people or organization moving
-
Origin: where the trip originates
-
Destination: where the trip ends
-
Mode of transportation: the vehicle used in the movement
-
Date: the date of the movement
There may be more roles in an entity mention than you are interested in capturing. For example, let's assume you want to know who flew, but you don't care about when they flew. You would define the role of traveler, but would not define a role for date or time. Part of defining the schema for an event model is determining which roles are important to your organization and task.
Roles are generic categories, such as traveler, origin, and destination, When annotating event mentions, you tag extracted entities with the role they perform in the entity mention. Extractors define the rules used to extract role candidates from text.
A role can be required or optional. If required, an event mention will not be extracted without the role. You should only mark a role as required if it must always be in the event mention. Let's look at some examples for a flight scenario.
Bob flew from Boston to Los Angeles on Wednesday.
The key phrase and roles are:
Let's assume the destination is marked as required in the schema definition. In this case, only one of the following event mentions will be extracted.
Bob flew to Los Angeles.
Bob's flew from Boston on Wednesday.
The second event mention will not be extracted, since it does not contain the required role, even if it is annotated.
Determining Events and Key Phrases
It can be difficult to define when you need to separate the events you are trying to extract into different event types. Some events might be very similar, but the roles in the event have different perspectives to the key phrase. In this case, you will want to create separate event types. Otherwise, the model may have difficulty determining the correct roles.
For example, let's consider a Commerce event for buying and selling show tickets. One way to model this would be to create a purchase event that includes both buying and selling.
-
Event: commerce event
-
Key Phrases: buy, obtain, sell, distribute
-
Roles buyer, seller, show
Let's consider a couple of events:
In these examples, the model will have difficulty identifying correctly the buyer and the seller if they are the same event type. The event model cannot distinguish the different perspectives the roles may have based on the key phrase; all key phrases in a single event type are expected to have the same relationship to the roles.
Therefore, we strongly recommend that when key phrases have different relationships to the roles, they should be separated into separate event types.
-
Event: selling event
-
Key Phrases: sell, distribute
-
Roles buyer, seller, show
A similar example would be the events entering and exiting. While they may have the same roles (person, from location, to location, time), the perspective of the person to the locations is different for each key phrase.
Rosette has multiple techniques to identify candidate key phrases and roles in text. For example, it can match a list of words, or it can match all the lemmas for a given word. Using Rosette Entity Extractor (REX) it can identify entity mentions of specific entity types. Extractors define the rules and techniques used to identify role and key phrase candidates in the text. While any extractor type can be used to define roles, only morphological extractors can be used to identify key phrase candidates.
Once defined, extractors are reusable in multiple schemas. An extractor named location may be defined as the standard REX entity type Location. It could be used in troop_movement events as well as travel events, as each of them have roles involving locations.
The currently supported extractor types are:
-
Entity: A list of REX entity types. You can use the standard, pre-defined REX entity types or train a custom model to extract other entity types. The custom model must be loaded in Rosette Server to define an entity extractor with custom entity types.
-
Semantic: A list of words or phrases. Any word whose meaning is similar to one of these words will match. For example, an extractor of meeting will match assembly, gathering, conclave. Rosette uses word vector similarity to identify similar words. While a semantic extractor can be defined by a phrase, it will only identify single words as candidate roles.
-
Morphological: A list of words. When a word is added to this list, it is immediately converted to and stored as its lemma. Words with the same lemmatization will match. For example, a morphological extractor for go will match going, went, goes, gone.This is the only extractor type valid for key phrases.
-
Exact: a list of words or phrases. Exact will match any words on the list, whether they are identified as entity types or not. For example, you could have a list of common modes of transportation, including armored personnel carrier and specific types of tanks.
Role types are generic categories, while role mentions are specific instances of those categories. Extractors define the specific rules to extract the role candidates. Extractors are combined into role types.
-
Role types define the rules that are used to identify a piece of text as a candidate for a specific role or key phrase. A role type is made up of one or more extractors and is reusable.
-
Multiple extractors can be included in a role type definition. They are combined as a union - all possible candidates extracted are included.
To define a role type and assign extractors:
-
Select Configure.
-
Each role type must have a unique name.
-
Select one or more extractors for this role type.
The green check mark indicates all extractors are configured.