Deep Learning Models for Topic Classification and Disentanglement

Loading...
Thumbnail Image

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Modeling topics discussed in texts is a fundamental problem in natural language processing, which includes specific tasks in topic classification, clustering, and disentanglement. The capability of machines in understanding and modeling topics is a key component needed for developing many natural language understanding and generation applications. Despite the significant progress brought by deep learning, it remains a challenging task to model texts that contain multiple topics or subtopics. In this thesis, we explore and develop better deep learning methods for modeling, classifying, and disentangling topics discussed in the documents to be analyzed. We carry out our studies in two typical setups. First, in topic classification, the target topics and subtopics are organized in a hierarchy and closely related to each other. We explore how to use the hierarchical structure of topics to strengthen the interconnections among the topics and design algorithms that enhance the inference stage. In the second setup, multiple topics are discussed and interwoven along the timelines, and we aim to model and disentangle topics. Our main contributions are threefold. First, we develop a reasoning framework based on reinforcement learning to incorporate the hierarchical structure of topics for multi-label classification. The framework utilizes the semantic relations between topics to enhance the performance of pretrained language models. Second, we design an end-to-end algorithm that can discover different topics included in human conversations that are interwoven in a temporal order. In addition, a dataset is released to contribute to the future research on this problem. The third contribution is an end-to-end algorithm that can separate human dialogue messages related to different topics in an unsupervised manner, thereby alleviating the problem of the shortage of large annotated data for training deep learning models.

Description

Keywords

Natural language processing, Topic modeling

Citation

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution 4.0 International