Note
You must be registered as a manager for the project.
Select New Project in the Global Navigation bar.
Enter a name for the project. Choose a name that will be meaningful to you and your annotation team.
Select a Template type from the drop down menu. The templates are predefined and customized for each type of project.
Add annotators (and adjudicators) to the project. Click on Add/remove annotators to select from the users in the system. Users must be added to the system before you can add them to a project.
Configure the project. Each template type has its own set of configuration options.
Create.
Your new project will appear in the Project List.
Each template type has its own set of configuration options. In many cases the defaults will be acceptable. Here we list the options most likely to require adjustment.
All Templates:
Model Language: The language of the samples.
Hide Eval Predictions: By default, the Active Learning server offers suggested annotations for each sample. Suppress these on the samples in the evaluation set to avoid biasing the human annotators. Setting this option tells the Studio to only display suggestions for samples in the training set, and not for those in the evaluation set.
Prioritize Partially Annotated Docs: Prioritizes the samples presented for annotation such that full documents are completely annotated.
Show Token Boundaries: This option only affects the presentation of the text samples. When turned on, each unannotated span is underlined to make it easier to see the token spans.
Mouse Full Token Selection: When you click on or select part of the text, the entire token is selected.
NER-Rosette:
-
Use Basis Training Data: When training the model, include the training data used to train the statistical model shipped with Rosette Entity Extractor. When this option is selected, new labels cannot be added to the project. If you are creating a project to train a model to extract new entity types (defining new labels), do not select this option.
Note
The time to train the model when Use Basis Training Data is selected may be a few minutes longer than without the extra training data, The time is determined by the number of annotated documents as well as the language.
Train Case Sensitive Model: Determines whether the trained model will be case-sensitive or case-insensitive. The default is case-sensitive (checked).
Most project configuration options can be changed after the project is created. Click on the project menu in the upper right corner of the project and select Configure. Some options, such as model language and use Basis training data, cannot be changed once the project has been created.
Note
You must be registered as a manager for the project.
Annotation is a process of assigning labels to parts of documents. The set of labels, as well as the words or parts of words assigned to the label, depends on the type of annotation task.
For named entity recognition (NER), the labels are the entity types the model is being trained to extract. The predefined entity types are automatically determined by language.
To define a label provide the following information:
Code: This is a brief name for the label. By convention, it is usually uppercase and about three letters. This will be used internally and in places where the UI shows a label in a brief form.
Caption: A longer user-friendly name for this label, usually one or two words long. It should be descriptive, but reasonably short.
Color: The color used for annotations of this label. You can accept the default color or assign a color.
Include in Model Training: If the model being trained should use the annotations which use this label, this box must be selected.
You can add a label or edit existing labels after the project has been created.
-
To add a new label, select Add Label from the list of labels, fill in the form, and Add. If the annotations with the label should be used in training, check the box for Include in Model Training.
Note
If Use Basis Training Data was selected when the project was created, you cannot add new labels to the project. the Add Label option will not be displayed
To edit an existing label, click on the label in the list of labels. Change any of the fields and Save.
To delete a label, click on the label in the list of labels and select Delete. You can only delete a label if there are no annotations for it. Once the label has been assigned during annotations, it cannot be deleted from that project.
Warning
The Studio will not prompt you to re-annotate samples when a new label is added.
Note
You must be registered as a manager for the project.
Select Add Document in the action bar at the bottom of the project’s dashboard.
(Optional) Give the document a name. Some documents are named automatically; for example, each file in an ingested zip file will create a document named with the filename.
-
Add one or more documents.
(Optional) Assign annotators. By default, each added document will be given to all the annotators currently assigned to the project. If you add annotators to the project later, they will not be automatically be given documents that were added earlier.
-
Add the documents into the project and select whether the document(s) will be used for training or validation.
Add for training: documents used to train the model
Add for validation: gold data used to test the model's accuracy
Add (auto-split): automatically assign documents to the training and validation sets
Note
Documents that are identical will be merged. For pure-text documents it automatically deduplicates documents. For .adm documents, which may include prior annotations or metadata, the additional data is merged into a single document.