Cost Effective Annotation Framework Using Zero-Shot Text Classification

Conference Publication ResearchOnline@JCU
Kasthuriarachchy, Buddhika;Chetty, Madhu;Shatte, Adrian;Walls, Darren
Abstract

Manual and high-quality annotation of social media data has enabled companies and researchers to develop improved implementations using natural language processing. However, human text-annotation is expensive and time-consuming. Crowd-sourcing platforms such as Amazon's Mechanical Turk (MTurk) can be leveraged for the creation of large training corpora for text classification tasks using social media data. Nevertheless, the quality of annotations can vary significantly, based on the interpretations and motivations of annotators completing the tasks. Further, the labelling cost of data through MTurk will increase if target messages are small and having a significant amount of noise (e.g. promotional messages on Twitter). In this work, we propose a new annotation framework to create high-quality human-annotated datasets for text classification from social media data. We present a zero-shot text classification based pre-annotation technique reducing the adverse effects arising due to the highly skewed distribution of data across target classes. The proposed framework significantly reduces the cost and time while maintaining the quality of the annotations. Being generic, it can be applied to annotating text data from any discipline. Our experiment with a Twitter data annotation using the proposed annotation framework shows a cost reduction of 80% with no compromise to quality.

Journal

N/A

Publication Name

2021 International Joint Conference on Neural Networks (IJCNN)

Volume

N/A

ISBN/ISSN

978-1-6654-3900-8

Edition

N/A

Issue

N/A

Pages Count

8

Location

Shenzhen, China

Publisher

Institute of Electrical and Electronics Engineers

Publisher Url

N/A

Publisher Location

Piscataway, NJ, USA

Publish Date

N/A

Url

N/A

Date

N/A

EISSN

N/A

DOI

10.1109/IJCNN52387.2021.9534335