New feature
This project was done during my time at ExB.
ExB's Cognitive Workbench (CWB) was mainly used for entity and relation extraction in documents.
Text Classification was a very new field and we decided to cover this topic in two steps.
In the first phase, we explored the topic of text classification and estimated what is necessary to achieve good results. A simplified model based on folder-based learning was created. This allows the user to use information patterns in folders to teach the system what patterns look like.
In the second phase, the method switched to the more promising method of classifying text passages by extracting information from the text. This caused so much refinement that the Text Classification app had to become the main task of the second phase.
In addition to a tool for defining text classification analyses, the main result was an app for displaying the classification results of the system.
With tables and diagrams, the app provides a clear overview of how documents and pages have been classified and how confident the system is regarding the classifications.
Another feature is the overview of the total number of classifications and the distribution of classes.
Example
50% of the pages are invoices
30% are ads
20% could not be assigned to either of the two classes given.