Franziska Lorenz

Annotation Editor

Concept for a general purpose web-based annotation tool for a wide range of linguistic annotations

^{Project Type}

Rework / New feature

This project was done during my time at ExB.

The project

The Annotation Editor is one of ExB's internal tools used by Data Curators to create annotation data that serves to train named entity-, relation- and text classification recognizer. Annotations can be used as feedback for the training of recognizer, so that they can be improved over time.
Previously, the annotator was an external Java tool. Since the goal was to integrate it permanently into the Cognitive Workbench, I was asked to record the current status of the Java tool and to create a revision proposal. In the summer of 2018 the annotation editor was moved into the CWB platform.

Screenshot of the previously used Java Annotation Editor for annotating training data for named entity-, relation- and text classification recognizers.

+ Show Design Process

Average reading time: 6 minutes (1332 words)

The team

In Phase 1, this project and all outputs served only as a first collection of ideas. So the team was made up of me alone.

As the primarily goal of phase 2 was the integration in the web based platform CWB, the team consisted mainly of developers from different teams (kernel, platform, backend, frontend, research).

^Research

Competitor analysis

To find my way into the topic, I researched the Internet for alternative and similar solutions that address the same or similar use cases.
Various tools like the Web Annotation Editor "WebAnno" gave me a rough idea of how annotation was solved by others. By testing the tools I was able to determine which features we could approach in a similar way.

Various other solutions in the field of web-based text annotation

^Research

Interviews

To empathize with the work of a Data Curator, I let them show me how they work and asked them about their experiences. Targeted tasks and their solutions gave me further insights into working methods and workarounds. Many questions confronted the users with marginal problems, so we found dead ends and identified productivity restrictions.
The feedback from the interviews and my observations formed the basis for a rough user journey.

User journey of a Data Curator during annotating a text with defined annotation types

Pain points

the selection of documents to annotate must be managed separately
down- and upload from a web application is necessary
upload often fails due to data incompatibilities
well-known usability topics are not addressed (shortcuts like CTRL+C)
visualization of multiple annotations is confusing
difficult to install the Java tool
not open for customer projects

Key things to redesign

document management
status management for documents
support the user in suggesting what makes sense to annotate
annotation workflows needed improvements
less distractions
Annotation Editor should be usable by keyboard only keyboard or also using the mouse

Overall concept

Overview diagrams helped me to keep track of the processes of the data curators.

Incorporation of annotation operations in ExB's web-based tool Cognitive Workbench and how it relates to existing features.

Annotation occurrences and connections within a text

Sketches helped me to work out a general structure and to plan my next steps. I continued with the actual core of my task: visual idea creation for improvements and basic concepts for using the editor in the browser.

Sketches

In the initial phase of a project, I am faster if I put my thoughts on paper. The sketches served as a basis for brief feedback from the users, so I could build the following screen prototypes in Sketch.
With the integration into the CWB, the design guidelines were set, so the primary focus was on improved workflows.

Selection of documents

In the configuration menu, the project manager can assign texts to Data Curators and see their current progress. The menu is opened when the user invokes the Annotation Editor for the first time or when new texts have been added.
The Data Curators cannot see the assignment option. Instead, they only see what has been assigned to them. The "Add Document" button can be used to add more documents.
The screen is divided into the following sections:

Search filter option
Column header bar with sorting option
Text collection in form of a list of documents
Bottom action bar

Default Annotation View

Text View
The text is split up into sentences in their right context by default. The document title is shown as a sticky header so this context information is always present.
Each sentence shows a confidence rating that has been created when the text got already pre-annotated by a machine. The confidence score informs the Data Curators about how sure the machine is about its own annotations. The very right column within the text view shows check marks for already completed sentences. These are being checked automatically if a sentence is fully annotated. The checkmarks can also be manually set by the Data Curators and give them an overview of what they have already done. In addition, this setting provides information about what the machine will later use to train individual the recognizers used as annotation types.

The Annotation List
The annotation list features a search filter and sorting options.
The list below contains a staggered list of annotation types, which can be collapsed and expanded at different levels. The deepest level are the annotation instances grouped together.
Each level indicates how many child elements it contains through a number on the right side. Furthermore all annotations are automatically color-coded, so that annotations of different affiliation in the text can be better distinguished from each other.

Annotating Text

Selecting text to annotate can work in various ways::

Single click on a word: The whole word is selected
Click + drag the mouse cursor: The letters are selected individually for the distance the user chooses
Ctrl + click on individual words: Selects multiple words which may also be separated by other (not selected) words
Shift + click on two words, stretches the selection between the two words, including everything in between

To annotate the selected text, the user uses its shortcut or starts typing the annotation type name. For "Hormone", the user would start with "h ... o ..." and with each additional letter, the system proposes an automatic completion that is highlighted.
As long as the menu is open, the user can navigate through the list with "TAB". "Return" confirms the selection and assigns the annotation type.

Annotations can also be created using the keyboard only:
Use the arrow keys to navigate from token to token through the text. The tokens are highlighted one at a time to indicate their location. When the highlighting arrives at a token and the user enters the name of the annotation type and follows the same workflow as described above, that word becomes an annotation.
This method can also be used to annotate areas of words. The user can hold down the Shift key while drawing a range of tokens with the arrow keys.
If "Alt" is held down, the annotation mode is set to a "character by character" selection mode, in which the same mechanism is applied to each character. This mode does not allow annotation across areas, while leaving out words in between (such as "Ctrl + Click").

Create Relations

Relation annotations are created by switching to the relations tab in the Annotations List and dragging from one annotation label to the next one. Upon release of the mouse, the menu that allows the user to type in the name of the relation type pops up. Created relations are collected in their own list and displayed through embracing brackets.

After this phase, the project was postponed to an unknown time, as the company decided to continue with other topics. They also wanted to wait until their base system, the Cognitive Workbench, was mature enough to effectively integrate this annotation editor.

^{Phase 2}

Project setup & the team

Since the CWB evolved towards a new component-based architecture (material design), my ideas and rough concepts from 2016 could not be integrated. The annotation editor should be integrated only technically for now.
My participation consisted of a few sketches for a general structure without reworking workflows from the previous Java Annotation Editor. Some additional features like displaying images / PDF files and using the OCR technology to extract text were added, which had completely new requirements.

Image and text annotation tool

Basic structure of the annotation editor, divided into annotation list, image area for visual annotations and text area for textual annotations

Control and completion of the text

Proofreading function

Comparing the work of different Data Curators

Result

Based on the existing Java Annotation Editor, my ideas in the first phase and the sketches for the integration into the current platform, we defined a basic set of what needs to be implemented.
The current annotation editor is opened in the Cognitive Workbench from an existing document list. The user interface consists of a header, a list of annotation types, a visual area, and a text area.

Header

Contains general information about the document as well as options like save and close.

Annotation types list

Lists all named entity-, relation-, text classification- and visual types to be annotated by the Data Curators. It also provides the digestive color code and shortcut to annotate. After creation the annotations are listed beneath their types.

Visual area

Includes image or PDF to annotate image information like tables, logos and handwriting.

Textual area

Contains the text of the document whose values can be annotated - for example names, locations, addresses, dates.

Next project →

Portfolio

About

Say hi