Automatic Detection of Metastatic Diseases from Radiology Reports Using Pre-trained Large Language Models
| dc.contributor.author | Ashofteh Barabadi, Maede | |
| dc.contributor.department | Electrical and Computer Engineering | |
| dc.contributor.supervisor | Chan, Wai Yip | |
| dc.contributor.supervisor | Zhu, Xiaodan | |
| dc.creator.stunr | 20370322 | |
| dc.date.accessioned | 2024-10-02T19:34:11Z | |
| dc.date.available | 2024-10-02T19:34:11Z | |
| dc.date.issued | 2024-10-02 | |
| dc.degree.grantor | Queen's University at Kingston | en |
| dc.description.abstract | Artificial Intelligence (AI) has been instrumental in automating processes across various domains, resulting in increased productivity, less human error, and reduced labour costs. In healthcare, specifically, AI-driven automation holds particular promise, helping with staff shortages and improving patient outcomes. To bring these benefits to practice, AI systems should be designed in an effective and reliable manner and tailored to the specific challenges of healthcare applications. This dissertation focuses on automating the identification of metastatic disease in cancer patients by involving Natural Language Processing (NLP) advancements and carefully identifying and addressing the practical challenges and limitations in clinical setups. Our proposed solution leverages a pre-trained Language Model (LM), fine-tuned on radiology reports annotated by human experts for the presence of metastatic disease in specific organs. However, the effectiveness of this approach is initially constrained by the limited availability of labelled data. We address the data scarcity challenge in a few ways. First, a large unlabelled dataset is automatically annotated to expand the training corpus. Despite the inherent noise introduced by automated labelling, our experimental results demonstrate the substantial benefits of this expanded dataset. Second, Parameter-Efficient Fine-Tuning (PEFT) techniques are applied, which enhance the LM’s performance in low-data scenarios compared to the traditional fine-tuning approach while also being more computationally efficient. Finally, synthetic data generation is utilized for data augmentation, where an instruction-tuned Large Language Model (LLM) is prompted to generate high-quality clinical text similar to the existing samples without any task or domain-specific training. Additionally, we explore the crucial role of patient history in accurately detecting metastatic disease, as radiology reports often emphasize changes relative to previous findings rather than listing all observations explicitly. The model architecture is modified to incorporate historical radiology reports, enabling a more context-aware prediction process. Experimental findings underscore the importance of integrating historical information, demonstrating its positive impact on annotation accuracy. Overall, this research presents a cost-effective, high-performance solution for identifying metastatic sites in cancer patients through the analysis of their radiology reports, which enables large-scale, spatiotemporal analyses of cancer progression. Our methods have the potential to extend to other clinical tasks with similar settings. | |
| dc.description.degree | M.A.Sc. | |
| dc.embargo.liftdate | 2029-09-30 | |
| dc.embargo.terms | The last paper used in my thesis is still under preparation for journal submission, and I would like to restrict my thesis for two years to publish the paper through the journal venue before releasing my thesis. | |
| dc.identifier.uri | https://hdl.handle.net/1974/33526 | |
| dc.language.iso | eng | |
| dc.relation.ispartofseries | Canadian theses | en |
| dc.subject | Natural Language Processing | |
| dc.subject | Large Language Models | |
| dc.subject | Generative Data Augmentation | |
| dc.title | Automatic Detection of Metastatic Diseases from Radiology Reports Using Pre-trained Large Language Models | |
| dc.type | thesis | en |
