Author ORCID Identifier
Document Type
Thesis
Date of Award
2023
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
KC Santosh
Abstract
Chronic respiratory diseases, ranking as the third leading cause of death worldwide according to the 2017 World Health Organization (WHO) report, affect a staggering 544.9 million individuals. Compounding this public health challenge is the fact that over 80% of health systems grapple with shortages in their radiology departments, highlighting an urgent need for accessible and efficient diagnostic solutions. While various image classification models for analyzing thorax abnormalities have been developed, relying solely on one type of dataset (image data, for example) for thorax abnormality analysis is insufficient. Integrating texts with image data could provide more accuracy as well as analysis. In response to this challenge, we propose a multimodal approach to generate detailed radiology reports from chest X-ray images and their corresponding radiological reports (Impression and Findings). Our framework integrates a pre-trained Convolutional Neural Network (CNN) for robust image feature extraction, a Recurrent Neural Network (RNN), and a visual attention mechanism to ensure coherent sentence generation. The image encoder employs the ResNet152 architecture to extract nuanced visual features from chest X-ray images. Simultaneously, the sentence generation model utilizes a Long Short-Term Memory (LSTM) layer to process textual data and generate contextually relevant reports. On an IU dataset of 7470 pairs of X-ray images and 3995 reports, our model exhibited superior performance based on language generation metrics (BLEU1= 0.4424, BLEU2= 0.2923, BLEU3= 0.207, BLEU4= 0.1464, ROUGE= 0.3396, and CIDEr= 0.2268), providing accurate and coherent impressions and findings compared to other benchmark models.
Subject Categories
Computer Sciences
Keywords
MULTIMODAL LEARNING, CHEST X-RAY, THORAX ABNORMALITY, pre-trained Convolutional Neural Network (CNN), robust image feature extraction, Recurrent Neural Network (RNN), visual attention mechanism, coherent sentence generation
Number of Pages
57
Publisher
University of South Dakota
Recommended Citation
Subedi, Gaurab, "MULTIMODAL LEARNING: GENERATING PRECISE CHEST X-RAY REPORT ON THORAX ABNORMALITY" (2023). Dissertations and Theses. 194.
https://red.library.usd.edu/diss-thesis/194