Author ORCID Identifier

https://orcid.org/0009-0008-7001-4658

Document Type

Thesis

Date of Award

2023

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

KC Santosh

Abstract

Emotion analysis, a subset of sentiment analysis, involves the study of a wide array of emotional indicators. In contrast to sentiment analysis, which restricts its focus to positive and negative sentiments, emotion analysis extends beyond these limitations to a diverse spectrum of emotional cues. Contemporary trends in emotion analysis lean toward multimodal approaches that leverage audiovisual and text modalities. However, implementing multimodal strategies introduces its own set of challenges, marked by a rise in model complexity and an expansion of parameters, thereby creating a need for a larger volume of data. This thesis responds to this challenge by proposing a robust model tailored for emotion recognition, specifically focusing on leveraging audio and text data. Our approach is centered on using audio spectrogram transformers (AST), and the powerful BERT language model to extract distinctive features from both auditory and textual modalities followed by feature fusion. Despite the absence of the visual component, employed by state-of-the-art (SOTA) methods, our model demonstrates comparable performance levels achieving an f1 score of 0.67 when benchmarked against existing standards on the IEMOCAP dataset [1] which consists of 12-hour audio recordings broken down into 5255 scripted and 4784 spontaneous turns, with each turn labeled by emotions such as anger, neutral, frustration, happy, and sad. In essence, We propose a fully attention-focused multimodal approach for effective emotion analysis for relatively smaller datasets leveraging lightweight data sources like audio and text highlighting the efficacy of our proposed model. For reproducibility, the code is available at 2AI Lab’s GitHub repository: https://github.com/2ai-lab/multimodal-emotion.

Subject Categories

Computer Sciences

Keywords

Emotion analysis, Multimodal learning, Transformer

Number of Pages

Publisher

University of South Dakota

Recommended Citation

Bajracharya, Siddhi Kiran, "MULTIMODAL EMOTION ANALYSIS WITH FOCUSED ATTENTION" (2023). Dissertations and Theses. 190.
https://red.library.usd.edu/diss-thesis/190

Download

Included in

Computer Sciences Commons

COinS

Dissertations and Theses

MULTIMODAL EMOTION ANALYSIS WITH FOCUSED ATTENTION

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Abstract

Subject Categories

Keywords

Number of Pages

Publisher

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations and Theses

MULTIMODAL EMOTION ANALYSIS WITH FOCUSED ATTENTION

Author

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Abstract

Subject Categories

Keywords

Number of Pages

Publisher

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links