Abstract: Multimodal fusion models are widely applied in practical tasks such as image classification and sentiment analysis. However, real-world data often contain noisy or imbalanced modality inputs ...