Franziska Lorenz

Automated Text Classification

Extension of the basic functions of the existing platform CWB by another core function - classifying text

^{Project Type}

New feature

This project was done during my time at ExB.

The project

ExB's Cognitive Workbench (CWB) was mainly used for entity and relation extraction in documents. Text Classification was a very new field and we decided to cover this topic in two steps.
In the first phase, we explored the topic of text classification and estimated what is necessary to achieve good results. A simplified model based on folder-based learning was created. This allows the user to use information patterns in folders to teach the system what patterns look like.
In the second phase, the method switched to the more promising method of classifying text passages by extracting information from the text. This caused so much refinement that the Text Classification app had to become the main task of the second phase.

+ Show Design Process

Average reading time: 7 minutes (1142 words)

The team & my task

Two frontend developers and one backend developer worked on the application. I was involved as UX designer and involved the Director of Design to discuss ideas and solutions.
In the background numerous people from the fields of research, software development and quality assurance participated.

My first task was to talk to the developers to understand the technical implications, possibilities and consequences.
Discussions with the stakeholders of the project gave me an impression of what customers would expect from it. I was responsible for creating the concepts for the analysis tools and the visualization app. I accompanied the implementation phase and then worked closely with the QA team to test the functionality and appearance.

Process

Phase 1

Kick off meeting
discussions with research and engineering
creating first concept drafts for the folder-based analysis tool
first concepts for the visualization application
accompanying the implementation of phase

Phase 2

collection of feedback
creation of user stories
reworking the concept (include extraction-based text classification)
accompanying the implementation and QA testing

^{Phase 1}

Discovery

The outcome of the first phase was a concept and design for the folder-based analysis tool and the visualization app. Since the conception of the analysis tool has a high proportion of technical requirements, I will only describe the visualization app in the following.

In the first step, the task was to classify only whole documents.
Discussions with potential users have shown that both a general overview of classification results and detailed classifications are of interest.
General overview: how many documents in which class
Detailed classifications: which document has been assigned to which class

^{Example: The personal inbox}
It could be divided into the following classes: Advertising, bills, postcards, letters, important documents, etc.
All stacks can then be used further, e.g. by opening the letters and reading individual ones.

^{Phase 1}

Sketches

We divided the app into two parts. An overview page with the number of available documents, classes and documents per class.
A detail page, where for each document is exactly indicated: title, other information (sender / author / date / etc.), classification.

The overview part explains to the user what the classification has produced.
For an overview about the distribution of classes, I decided to present the results in a diagram as well.
The table shows the detailed numbers. The columns of the table consist of the class names, the number of classified documents and a guideline on how confident the system is with this classification.

Selection of sketches for the part of the visualization app that aims to give an overview of the classifications.

The second part is a detailed list of all individual classifications. Initially, it is independent of the classifier and class. This view is intended to view the results based on the document level.

^{Phase 1}

Screens

Together with the other project participants, I agreed on the necessary UI frameworks High Charts for diagrams and AG Grid for tables. The visual design of the concepts in Sketch had a significant influence on their further development.

Selection of screens: First screen prototypes of previously scribbled ideas.

Excursion with the attempt to give an overview of several classifiers.

Since the application mainly consisted of the named frameworks and the largely existing component library, the implementation of the first phase could be completed relatively quickly.

Implementation of the classifications overview

Implementation of the detailed classification list

Both of these initial results were achieved, and we were able to use the time until the second phase of the project to gather feedback.

Phase 2

During a period of three months, the results of phase 1 were analysed, new use cases identified and the underlying classification system was extended.

^{Phase 2}

Interviews & Insights

After some discussions with users of the text classification feature, we gained the following insights:

we have to distinguish between document classification and page classification
overview of the classification does not allow to compare different classification models
app doesn't show what could not be classified
no indication of how confident the system is about a particular classification

^{Phase 2}

Use Cases

In addition, there were a number of new use cases, some of which had already been addressed and some which had to be considered. Here are some examples:
As a user of the visualization application,

I want to see a confidence value for each class of my classifier, so that I can judge the accuracy of the results.
I want to know how many documents a class has been trained on, so that I can gain a deeper understanding of the confidence value.
I want to see which and how many pages have been classified for a class, so that I can see the distribution.
I want to see in how many of my documents contain which page classifications, so that I can get an overview of the contents of the documents.
Example: How many of my documents contain a bill to pay?
I want to see how many documents / pages have been processed but could not be classified.
I want to see how sure the system is with each classification.

^{Phase 2}

Key things to redesign

Extension of the summary page
It now no longer contains only one classifier but all those selected in the current project.

Reviewing the summary page
By extending the overview page, we have chosen a new format. The cake diagram has been replaced by bar charts that have been incorporated into the table. This makes it possible to see different classifiers and compare them at the same time.

Document- and now also page classification
The new method of extraction-based text classification also required the ability to classify individual pages of a document.

Extension of detail table with more columns
The main focus remains on the individual classifications. This view has only been extended to include information about the classification confidence of the system.

Revision of columns grouping
We revised the grouping options to make the details of the individual classifications more accessible and to create an overview according to current requirements.
This feature was particularly interesting because grouping by document title can indicate that a document may have been classified into different classes.
In addition, the individual classified pages were again presented as a coherent group and a detailed insight into the content of the document was given.

Challenges & problems

A major challenge has been that text classification has to extend across an entire existing platform. In addition to new applications, old ones had to be adapted, which was not part of the project scope.
In addition, we set tables for the AG Grid framework. This framework has many possibilities, but also many limitations and cannot always be adapted without great effort.

Outcome

In addition to a tool for defining text classification analyses, the main result was an app for displaying the classification results of the system.
With tables and diagrams, the app provides a clear overview of how documents and pages have been classified and how confident the system is regarding the classifications.

Another feature is the overview of the total number of classifications and the distribution of classes. ^Example 50% of the pages are invoices 30% are ads 20% could not be assigned to either of the two classes given.

Next project →

Portfolio

About

Say hi

Automated Text Classification

The project

The team & my task

Process

Discovery

Sketches

Screens

Phase 2

Interviews & Insights

Use Cases

Key things to redesign

Challenges & problems

Outcome