1. 北京航空航天大学计算机学院,北京 100191
2. 北京航空航天大学大数据与脑机智能高精尖创新中心,北京 100191
3. 中国航空综合技术研究所,北京 100028
[ "刘德志(1996- ),男,北京航空航天大学计算机机学院博士生,主要研究方向为知识消歧、信息抽取。" ]
[ "何柳(1988- ),男,中国航空综合技术研究所高级工程师,主要研究方向为人工智能、计算机视觉、多模态机器学习。" ]
[ "刘幼峰(1996- ),男,北京航空航天大学计算机学院硕士生,主要研究方向为多模态融合、知识图谱。" ]
[ "韩德纯(1980- ),男,北京航空航天大学大数据与脑机智能高精尖创新中心首席架构师,主要研究方向为大数据应用、网络安全系统。" ]
网络首发:2024-03,
纸质出版:2024-03-15
移动端阅览
刘德志, 何柳, 刘幼峰, 等. 基于多模态融合提升的文本分类方法[J]. 大数据, 2024,10(2):80-93.
Dezhi LIU, Liu HE, Youfeng LIU, et al. A text classification method based on multimodal fusion enhancement[J]. Big data research, 2024, 10(2): 80-93.
刘德志, 何柳, 刘幼峰, 等. 基于多模态融合提升的文本分类方法[J]. 大数据, 2024,10(2):80-93. DOI: 10.11959/j.issn.2096-0271.2023067.
Dezhi LIU, Liu HE, Youfeng LIU, et al. A text classification method based on multimodal fusion enhancement[J]. Big data research, 2024, 10(2): 80-93. DOI: 10.11959/j.issn.2096-0271.2023067.
尽管基于多模态的文本分类技术在应用到具体场景中具有潜力,但仍存在局限性。现有多模态融合模型要求输入数据模态对齐,因此大量不完整的多模态数据被直接浪费,从而限制了推理时可用数据的规模和灵活性。为了解决这个问题,提出了一种基于多模态融合提升的文本分类模型和不充分多模态资源训练方法。与传统方法相比,提出的模型在标准数据集上的性能平均提高了约4.25%。此外,在除文本输入模态外的其他模态缺失率为50%的情况下,不充分多模态资源训练方法的性能比传统多路由策略提高了约4%。这表明所提出的模型和训练方法具有明显的优势和有效性。
Although multimodal text classification techniques have potential when applied to specific scenarios
there are still some limitations.Existing multimodal fusion models require modal alignment in the input data
resulting in a large amount of incomplete multimodal data being directly discarded
thus limiting the scale and flexibility of available data for inference.To address this problem
we proposed a text classification model based on multimodal fusion enhancement and an insufficient multimodal resource training method.Compared with traditional methods
our model had shown an improved performance of an average of 4.25% on a standard dataset.Furthermore
when the missing rate of other modalities except for text input was 50%
using the insufficient multimodal resource training method improved the performance by about 4% compared with traditional multi-route strategies.The experimental results demonstrate the effectiveness of the proposed model and training method.
ZADEH A , ZELLERS R , PINCUS E , et al . Mosi:multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos [EB ] . arXiv preprint , 2016 ,arXiv:1606.06259.
PORIA S , CAMBRIA E , HAZARIKA D , et al . Multi-level multiple attentions for contextual multimodal sentiment analysis [C ] // Proceedings of 2017 IEEE International Conference on Data Mining (ICDM) . Piscataway:IEEE Press , 2017 : 1033 - 1038 .
GUO W , WANG J , WANG S . Deep multimodal representation learning:a survey [J ] . IEEE Access , 2019 , 7 : 63373 - 63394 .
CAMBRIA E , HAZARIKA D , PORIA S , et al . Benchmarking multimodal sentiment analysis [M ] . Computational linguistics and intelligent text processing . Cham : Springer , 2018 : 166 - 179 .
ZADEH A , CHEN M H , PORIA S , et al . Tensor fusion network for multimodal sentiment analysis [C ] // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing . Stroudsburg:Association for Computational Linguistics , 2017 : 1103 - 1114 .
ZADEH A , LIANG P P , MAZUMDER N , et al . Memory fusion network for multiview sequential learning [C ] // Proceedings of the AAAI Conference on Artificial Intelligence .[S.l.:s.n. ] , 2018 .
DEVLIN J , CHANG M W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [EB ] . arXiv preprint 2018 ,arXiv:1810.04805.
SUN Y , WANG S , LI Y , et al . Ernie:enhanced representation through knowledge integration [EB ] . arXiv preprint , 2019 ,arXiv:1904.09223.
CUI Y M , CHE W X , LIU T , et al . Revisiting pre-trained models for Chinese natural language processing [C ] // Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2020 . Stroudsburg:Association for Computational Linguistics , 2020 : 657 - 668 .
LIU Y , OTT M , GOYAL N , et al . Roberta:a robustly optimized bert pretraining approach [EB ] . arXiv preprint , 2019 ,arXiv:1907.11692.
SENNRICH R , HADDOW B , BIRCH A . Neural machine translation of rare words with subword units [C ] // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics . Stroudsburg:Association for Computational Linguistics , 2016 : 1715 - 1725 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 770 - 778 .
PETERS M E , NEUMANN M , IYYER M , et al . Deep contextualized word representations [EB ] . arXiv preprint , 2018 ,arXiv:1802.05365.
RADFORD A , NARASIMHAN K , SALIMANS T , et al . Improving language understanding by generative pretraining [Z ] . OpenAI , 2018 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York:ACM , 2017 : 6000 - 6010 .
IOFFE S , SZEGEDY C . Batch normalization:accelerating deep network training by reducing internal covariate shift [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning . New York:ACM , 2015 : 448 - 456 .
HENDRYCKS D , GIMPEL K . Gaussian error linear units (GELUs) [EB ] . arXivpreprint , 2016 ,arXiv:1606.08415.
QI P , CAO J , YANG T Y , et al . Exploiting multi-domain visual information for fake news detection [C ] // Proceedings of 2019 IEEE International Conference on Data Mining . Piscataway:IEEE Press , 2020 : 518 - 527 .
JIN Z W , CAO J , GUO H , et al . Multimodal fusion with recurrent neural networks for rumor detection on microblogs [C ] // Proceedings of the 25th ACM international conference on Multimedia . New York:ACM , 2017 : 795 - 816 .
BOIDIDOU C , PAPADOPOULOS S , KOMPATSIARIS Y , et al . Challenges of computational verification in social multimedia [C ] // Proceedings of the 23rd International Conference on World Wide Web . New York:ACM , 2014 : 743 - 748 .
ANTOL S , AGRAWAL A , LU J , et al . Vqa:visual question answering [C ] // Proceedings of the IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2015 : 2425 - 2433 .
VINYALS O , TOSHEV A , BENGIO S , et al . Show and tell:a neural image caption generator [C ] // Proceedings of the IEEE conference on computer vision and pattern recognition . Piscataway:IEEE Press , 2015 : 3156 - 3164 .
WANG Y Q , MA F L , JIN Z W , et al . EANN:event adversarial neural networks for multi-modal fake news detection [C ] // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . New York:ACM , 2018 : 849 - 857 .
JINSHUO L , KUO F , PAN J Z , et al . MSRD:multi-modal web rumor detection method [J ] . Journal of Computer Research and Development , 2020 , 57 ( 11 ): 2328 - 2336 .
JIANA M , XIAOPEI W , TING L , et al . Cross-modal rumor detection based on adversarial neural network [J ] . Data Analysis and Knowledge Discovery , 2023 , 6 ( 12 ): 32 - 42 .
MIYATO T , DAI A M , GOODFELLOW I , et al . Adversarial training methods for semisupervised text classification [EB ] . arXiv preprint , 2016 ,arXiv:605.07725.
MADRY A , MAKELOV A , SCHMIDT L , et al . Towards deep learning models resistant to adversarial attacks [EB ] . arXiv preprint , 2017 ,arXiv:1706.06083.
0
浏览量
297
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621