Franziska Lorenz—
UX Designer

Automated Text Classification

Extension of the basic functions of the existing platform CWB by another core function - classifying text

Project Type

New feature

This project was done during my time at ExB.

The project

ExB's Cognitive Workbench (CWB) was mainly used for entity and relation extraction in documents. Text Classification was a very new field and we decided to cover this topic in two steps.
In the first phase, we explored the topic of text classification and estimated what is necessary to achieve good results. A simplified model based on folder-based learning was created. This allows the user to use information patterns in folders to teach the system what patterns look like.
In the second phase, the method switched to the more promising method of classifying text passages by extracting information from the text. This caused so much refinement that the Text Classification app had to become the main task of the second phase.

Average reading time: 7 minutes (1142 words)

Outcome

In addition to a tool for defining text classification analyses, the main result was an app for displaying the classification results of the system.
With tables and diagrams, the app provides a clear overview of how documents and pages have been classified and how confident the system is regarding the classifications.

Another feature is the overview of the total number of classifications and the distribution of classes. Example 50% of the pages are invoices 30% are ads 20% could not be assigned to either of the two classes given.