
Text & OCR Annotation

Text annotation is the process of labelling or tagging information to any of the text formats such as documents, records, files, cheques, order receipts etc. Highly accurate text annotation is required in order to train large NLP models.

Given below is an overview of various types of text annotations:

  • Entity Annotation - It is a process of locating, extracting and labelling named entities (NER), phrases (Key phrase tagging) and functional elements of speech (POS tagging)
  • Entity Linking - Entity linking deals with relating the named entities based on the connections between them. It gives additional information about the entities.
  • Text Classification - Text classification assigns a single label to the entire text based on content. It helps to know what the entire text is about.
  • Sentiment Annotation - When a model is being trained, sarcastic content makes it more difficult to understand the text's emotions. Aiding computers to understand emotions is challenging and annotation plays a significant role in accomplishing it.
  • Intent Analysis - It is a way of analyzing the reason or intention behind the text, which aids in understanding customer desires.

Image given above illustrates a text which is annotated with entities measurement, city, year, metal capital and country .

What is OCR?

OCR stands for Optical Character Recognition. In images that cannot be edited, such as scanned copies of bills, orders, handwritten scripts, bank accounts, etc, this method of text recognition is used. OCR is the process of converting the text in images to machine readable format.

OCR annotation has its scope in a wide range of sectors. Here, the image illustrates OCR annotation being used in Number plate recognition.

At HaiData, we offer a variety of text annotation services, at scale, that suits your NLP model requirements. Contact Us today for a free sample annotation!