There are two types of events schema, they both contain the same objects and definitions. The difference is whether the schema is attached to a project.
-
Schema templates
A schema template is a reusable set of event types, roles, role types, and extractors and is not associated with a project.
Accessed from the Events Schema Template option on the global navigation bar at the top of the Studio window.
-
Project schema
The project schema is associated with one and only one project. This schema may be modified as documents are annotated. The changes will be for that project only.
Accessed from the Project Schema option on the project dashboard.
Important
Changes made to a project schema do not modify the template the schema was created from. Changes are made to the project only.
To use the schema editor to create a template schema, select Events Schema Template from the global navigation bar.
Note
You must be a manager or superuser.
You can create a schema from nothing, or use an existing schema as a starting point. The schema created in this dialogue should be considered a template for an events project. When you create a project, the project schema will use the template. Any updates to the project schema will modify the project version only; the template will not be changed. The same template can be used by multiple projects, each of which will have its own version.
Once complete, Save the schema definition. You can then use the schema template in a project. To modify a project schema, select Project Schema from the project dashboard options.
Import from an Existing Schema
Schemas already defined may contain event types, roles, role types, and/or extractors which you'd like to use in a new schema template. If you want to use an existing schema template completely as defined, then there is no need to import or modify the schema. If you'd like to modify an existing schema template, then you can import it and modify it for your use. You can also import a json file definition of a schema.
-
Select Events Schema in the global navigation bar.
-
Select Import.
-
Select the file to import from the drop-down menu OR
Select a json file definition of a schema.
-
Select Everything to import the entire schema or Let me select to choose a subset of schema objects to import.
-
Next.
-
Select the objects you want to import. If you selected Everything, all objects will be selected. You can modify your choices at this time.
-
Import.
-
Complete defining the schema.
Tip
When you select a schema from the drop-down list, you'll see a preview of the schema definition.
When you create a new project, you must select a template schema for the project. This template could contain general extractors and role types that your organization is using for event extraction. It may also contain event types and roles for your project.
Once the schema template has been associated with the project, you can modify it by selecting Project Schema from the project dashboard menu.
Once annotation has begun, you cannot change or delete any event types, key phrases, or role types which have been used in annotation. You can add new event types, key phrases, and role types. You can delete objects which have not been used. But once used in annotation, most changes are not permitted. The one permitted change is to add a new extractor to a key phrase or role type.
Important
Changes made to a project schema do not modify the template the schema was created from. Changes are made to the project only.
Both the schema template and project schema use the same dialog to define and configure the schema.
The schema definition is broken up into 3 sections:
-
General information
-
Event Types
-
Role Types
Once the objects in a section are defined completely, the indicator in the right corner will change to a green check box.
Each schema is identified by a unique name. When creating schemas, you can assign them to a category. Categories organize schemas and can make it easier to locate a particular schema.
-
Schema template: If you import an existing template, the name will default to the name of the imported schema.
-
If you are modifying an existing schema template, you can use the same name.
-
If you are creating a new schema template by importing an existing template, you must change the name.
-
Project schema: The name will default to the name of the template selected when creating the project. You may want to change the name to reflect the project.
Each schema template has a profile id attached to it. The profile id specifies a custom entity extraction model, allowing entity extractors to use custom entity types. The drop-down list box lists all custom profiles available on Rosette Server. If you do not have custom entities and are not using a custom profile, select default.
Note
Any models, gazetteers, and regular expressions used when training a model must also be used when performing event extraction. Use the same custom profile to configure REX for model training and event extraction. The custom profile is set in the schema definition for event model training.
A schema is for one and only one language. Select the language from the drop-down list box.
The first task in defining your event schema is defining the set of event types you want to recognize. When extracting events, you don't extract all possible event types; you only extract the event types of interest. It's important to recognize which sorts of events are significant and will be mentioned frequently in your domain. Consider the set of entities and events that are going to be mentioned in the documents you will be analyzing. The goal is to train a model to extract only the event types that you are meaningful to your operation.
The defined schema can support multiple event types. For example, if you're analyzing travel blogs, you may want to identify airline and hotel events.
Each event type has one or more key phrases, a word in the text that evokes the given event type. Rosette uses key phrases to identify candidate event mentions from the text.
Key phrases are case-insensitive and related words are matched. For example, if the key phrase is fly, it will match fly, flying, and flies. The extractor for a key phrase is defined as a morphological extractor. Words of any case, with the same lemma as the key phrase, are considered a match.
Event mentions also include roles, that is, the people, places, times, and other mentions which add detail to the event mention. For a flying event, with a key phrase of flew, you may want to know who flew. Where did they go? The flyer and destination are roles for the event type. For each role, the role type must be defined.
-
Select Add Event Type.
-
Add an event type name. This is the label displayed when annotating data.
-
Add key phrases.
-
Add the roles with the corresponding role type.
-
Enter a name for the role. This is the label displayed when annotating data.
-
Select a role type from the drop-down menu or add a new role type.
-
Check the Required box if the role type must exist in an event mention for that event type.
Determining Events and Key Phrases
It can be difficult to define when you need to separate the events you are trying to extract into different event types. Some events might be very similar, but the roles in the event have different perspectives to the key phrase. In this case, you will want to create separate event types. Otherwise, the model may have difficulty determining the correct roles.
For example, let's consider a Commerce event for buying and selling show tickets. One way to model this would be to create a purchase event that includes both buying and selling.
-
Event: commerce event
-
Key Phrases: buy, obtain, sell, distribute
-
Roles buyer, seller, show
Let's consider a couple of events:
In these examples, the model will have difficulty identifying correctly the buyer and the seller if they are the same event type. The event model cannot distinguish the different perspectives the roles may have based on the key phrase; all key phrases in a single event type are expected to have the same relationship to the roles.
Therefore, we strongly recommend that when key phrases have different relationships to the roles, they should be separated into separate event types.
-
Event: selling event
-
Key Phrases: sell, distribute
-
Roles buyer, seller, show
A similar example would be the events entering and exiting. While they may have the same roles (person, from location, to location, time), the perspective of the person to the locations is different for each key phrase.
A role can be required or optional. If required, an event mention will not be extracted without the role. You should only mark a role as required if it must always be in the event mention. Let's look at some examples for a flight scenario.
Bob flew from Boston to Los Angeles on Wednesday.
The key phrase and roles are:
Let's assume the destination is marked as required in the schema definition. In this case, only one of the following event mentions will be extracted.
Bob flew to Los Angeles.
Bob's flew from Boston on Wednesday.
The second event mention will not be extracted, since it does not contain the required role, even if it is annotated.
Role types are generic categories, while role mentions are specific instances of those categories. Extractors define the specific rules to extract the role candidates. Extractors are combined into role types.
-
Role types define the rules that are used to identify a piece of text as a candidate for a specific role or key phrase. A role type is made up of one or more extractors and is reusable.
-
Multiple extractors can be included in a role type definition. They are combined as a union - all possible candidates extracted are included.
To define a role type and assign extractors:
-
Select Configure.
-
Each role type must have a unique name.
-
Select one or more extractors for this role type.
The green check mark indicates all extractors are configured.
Rosette has multiple techniques to identify candidate key phrases and roles in text. For example, it can match a list of words, or it can match all the lemmas for a given word. Using Rosette Entity Extractor (REX) it can identify entity mentions of specific entity types. Extractors define the rules and techniques used to identify role and key phrase candidates in the text. While any extractor type can be used to define roles, only morphological extractors can be used to identify key phrase candidates.
Once defined, extractors are reusable in multiple schemas. An extractor named location may be defined as the standard REX entity type Location. It could be used in troop_movement events as well as travel events, as each of them have roles involving locations.
The currently supported extractor types are:
-
Entity: A list of REX entity types. You can use the standard, pre-defined REX entity types or train a custom model to extract other entity types. The custom model must be loaded in Rosette Server to define an entity extractor with custom entity types.
-
Semantic: A list of words or phrases. Any word whose meaning is similar to one of these words will match. For example, an extractor of meeting will match assembly, gathering, conclave. Rosette uses word vector similarity to identify similar words. While a semantic extractor can be defined by a phrase, it will only identify single words as candidate roles.
-
Morphological: A list of words. When a word is added to this list, it is immediately converted to and stored as its lemma. Words with the same lemmatization will match. For example, a morphological extractor for go will match going, went, goes, gone.This is the only extractor type valid for key phrases.
-
Exact: a list of words or phrases. Exact will match any words on the list, whether they are identified as entity types or not. For example, you could have a list of common modes of transportation, including armored personnel carrier and specific types of tanks.
Once complete, Save the schema definition. Each section must have a green arrow indicating that it is complete.
-
Events Schema Template: if you've imported an existing schema and have not changed the name, you will see a message that the schema will be overwritten.
-
If you are modifying an existing schema template, select Continue to save the template.
-
If you are creating a new schema you must change the name before saving. Select Cancel, modify the name, then Save.
-
Project Schema: You can keep the original (template) name or modify it. The project schema is not linked back to the original template. Each project has one and only one schema associated with it.
Important
Changes made to a project schema do not modify the template the schema was created from. Changes are made to the project only.
Review Tentative Role Types
Note
You must be registered as a manager.
During annotation, an annotator may identify key phrases or role mentions that Adaptation Studio has not identified as candidates. The annotator can label a key phrase or a role that was not suggested. The Studio then creates a tentative role type enabling the annotator to continue annotating documents.
At some point, the tentative role types must be reviewed and approved by a manager. If the manager Accepts the new role type, it is added to the schema. If it shouldn't have been defined, the manager can Ignore the tentative role type.
To review tentative role types:
-
Select Project Schema from the project menu.
-
The Role Types section will show if there are any tentative role types.
-
Expand the Role Types section. All role types with tentative extractors will be labelled. Select Configure next to a role type with tentative extractors.
You will see which extractors are currently being used, as well as the samples using the tentative extractors. You can choose to:
Tentative extractors are useful for supplementing what is being extracted by the model. They can be used to fill in the "blind spots" of a model that is identifying some, but not all, of the desired phrases as candidate roles. However, if you find yourself making new exact entity extractors for almost every role, you likely need to train a new entity model.
Warning
Tentative extractors are ignored in the exported models and do not contribute to event model accuracy.
Tentative extractors should be resolved before exporting the model.
Samples with tentative transactors are not used for training. If you try to export a model that contains tentative extractors, a message will appear warning you that the model contains tentative extractors.