Automatic Detection of Metastatic Diseases from Radiology Reports Using Pre-trained Large Language Models

dc.contributor.authorAshofteh Barabadi, Maede
dc.contributor.departmentElectrical and Computer Engineering
dc.contributor.supervisorChan, Wai Yip
dc.contributor.supervisorZhu, Xiaodan
dc.creator.stunr20370322
dc.date.accessioned2024-10-02T19:34:11Z
dc.date.available2024-10-02T19:34:11Z
dc.date.issued2024-10-02
dc.degree.grantorQueen's University at Kingstonen
dc.description.abstractArtificial Intelligence (AI) has been instrumental in automating processes across various domains, resulting in increased productivity, less human error, and reduced labour costs. In healthcare, specifically, AI-driven automation holds particular promise, helping with staff shortages and improving patient outcomes. To bring these benefits to practice, AI systems should be designed in an effective and reliable manner and tailored to the specific challenges of healthcare applications. This dissertation focuses on automating the identification of metastatic disease in cancer patients by involving Natural Language Processing (NLP) advancements and carefully identifying and addressing the practical challenges and limitations in clinical setups. Our proposed solution leverages a pre-trained Language Model (LM), fine-tuned on radiology reports annotated by human experts for the presence of metastatic disease in specific organs. However, the effectiveness of this approach is initially constrained by the limited availability of labelled data. We address the data scarcity challenge in a few ways. First, a large unlabelled dataset is automatically annotated to expand the training corpus. Despite the inherent noise introduced by automated labelling, our experimental results demonstrate the substantial benefits of this expanded dataset. Second, Parameter-Efficient Fine-Tuning (PEFT) techniques are applied, which enhance the LM’s performance in low-data scenarios compared to the traditional fine-tuning approach while also being more computationally efficient. Finally, synthetic data generation is utilized for data augmentation, where an instruction-tuned Large Language Model (LLM) is prompted to generate high-quality clinical text similar to the existing samples without any task or domain-specific training. Additionally, we explore the crucial role of patient history in accurately detecting metastatic disease, as radiology reports often emphasize changes relative to previous findings rather than listing all observations explicitly. The model architecture is modified to incorporate historical radiology reports, enabling a more context-aware prediction process. Experimental findings underscore the importance of integrating historical information, demonstrating its positive impact on annotation accuracy. Overall, this research presents a cost-effective, high-performance solution for identifying metastatic sites in cancer patients through the analysis of their radiology reports, which enables large-scale, spatiotemporal analyses of cancer progression. Our methods have the potential to extend to other clinical tasks with similar settings.
dc.description.degreeM.A.Sc.
dc.embargo.liftdate2029-09-30
dc.embargo.termsThe last paper used in my thesis is still under preparation for journal submission, and I would like to restrict my thesis for two years to publish the paper through the journal venue before releasing my thesis.
dc.identifier.urihttps://hdl.handle.net/1974/33526
dc.language.isoeng
dc.relation.ispartofseriesCanadian thesesen
dc.subjectNatural Language Processing
dc.subjectLarge Language Models
dc.subjectGenerative Data Augmentation
dc.titleAutomatic Detection of Metastatic Diseases from Radiology Reports Using Pre-trained Large Language Models
dc.typethesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AshoftehBarabadi_Maede_MASC_202409.pdf
Size:
2.34 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.67 KB
Format:
Item-specific license agreed upon to submission
Description: