Author ORCID Identifier

https://orcid.org/0009-0004-2967-0355

Document Type

Thesis

Date of Award

2023

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

KC Santosh

Abstract

Chronic respiratory diseases, ranking as the third leading cause of death worldwide according to the 2017 World Health Organization (WHO) report, affect a staggering 544.9 million individuals. Compounding this public health challenge is the fact that over 80% of health systems grapple with shortages in their radiology departments, highlighting an urgent need for accessible and efficient diagnostic solutions. While various image classification models for analyzing thorax abnormalities have been developed, relying solely on one type of dataset (image data, for example) for thorax abnormality analysis is insufficient. Integrating texts with image data could provide more accuracy as well as analysis. In response to this challenge, we propose a multimodal approach to generate detailed radiology reports from chest X-ray images and their corresponding radiological reports (Impression and Findings). Our framework integrates a pre-trained Convolutional Neural Network (CNN) for robust image feature extraction, a Recurrent Neural Network (RNN), and a visual attention mechanism to ensure coherent sentence generation. The image encoder employs the ResNet152 architecture to extract nuanced visual features from chest X-ray images. Simultaneously, the sentence generation model utilizes a Long Short-Term Memory (LSTM) layer to process textual data and generate contextually relevant reports. On an IU dataset of 7470 pairs of X-ray images and 3995 reports, our model exhibited superior performance based on language generation metrics (BLEU1= 0.4424, BLEU2= 0.2923, BLEU3= 0.207, BLEU4= 0.1464, ROUGE= 0.3396, and CIDEr= 0.2268), providing accurate and coherent impressions and findings compared to other benchmark models.

Subject Categories

Computer Sciences

Keywords

MULTIMODAL LEARNING, CHEST X-RAY, THORAX ABNORMALITY, pre-trained Convolutional Neural Network (CNN), robust image feature extraction, Recurrent Neural Network (RNN), visual attention mechanism, coherent sentence generation

Number of Pages

Publisher

University of South Dakota

Recommended Citation

Subedi, Gaurab, "MULTIMODAL LEARNING: GENERATING PRECISE CHEST X-RAY REPORT ON THORAX ABNORMALITY" (2023). Dissertations and Theses. 194.
https://red.library.usd.edu/diss-thesis/194

Download

Included in

Computer Sciences Commons

COinS

Dissertations and Theses

MULTIMODAL LEARNING: GENERATING PRECISE CHEST X-RAY REPORT ON THORAX ABNORMALITY

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Abstract

Subject Categories

Keywords

Number of Pages

Publisher

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations and Theses

MULTIMODAL LEARNING: GENERATING PRECISE CHEST X-RAY REPORT ON THORAX ABNORMALITY

Author

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Abstract

Subject Categories

Keywords

Number of Pages

Publisher

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links