Document-level multi-topic sentiment classification of email data with BiLSTM and data augmentation

Journal Publication ResearchOnline@JCU
Liu, Sisi;Lee, Kyungmi;Lee, Ickjai
Abstract

Email data has unique characteristics, involving multiple topics, lengthy replies, formal language, high variance in length, high duplication, anomalies, and indirect relationships that distinguish it from other social media data. In order to better model Email documents and to capture complex sentiment structures in the content, we develop a framework for document-level multi-topic sentiment classification of Email data. Note that, a large volume of labeled Email data is rarely publicly available. We introduce an optional data augmentation process to increase the size of datasets with synthetically labeled data to reduce the probability of overfitting and underfitting during the training process. To generate segments with topic embeddings and topic weighting vectors as inputs for our proposed model, we apply both latent Dirichlet allocation topic modeling and semantic text segmentation to post-process Email documents. Empirical results obtained with multiple sets of experiments, including performance comparison against various state-of-the-art algorithms with and without data augmentation and diverse parameter settings, are analyzed to demonstrate the effectiveness of our proposed framework.

Journal

Knowledge Based Systems

Publication Name

N/A

Volume

197

ISBN/ISSN

1872-7409

Edition

N/A

Issue

N/A

Pages Count

11

Location

N/A

Publisher

Elsevier BV

Publisher Url

N/A

Publisher Location

N/A

Publish Date

N/A

Url

N/A

Date

N/A

EISSN

N/A

DOI

10.1016/j.knosys.2020.105918