1. 国家计算机网络应急技术处理协调中心,北京 100029
2. 北京邮电大学计算机学院,北京 100876
[ "秘蓉新(1984- ),女,国家计算机网络应急技术处理协调中心助理研究员,主要研究方向为人工智能、网络安全、信息内容安全。" ]
[ "姚文文(1984- ),女,国家计算机网络应急技术处理协调中心助理研究员,主要研究方向为网络安全战略、信息科技发展态势。" ]
[ "阮宏坤(2001- ),男,北京邮电大学计算机学院硕士生,主要研究方向为数据预处理、人工智能。" ]
网络首发:2024-07,
纸质出版:2024-07-15
移动端阅览
秘蓉新, 姚文文, 阮宏坤. 基于机器阅读理解的论文辅助阅读系统构建[J]. 大数据, 2024,10(4):121-129.
Rongxin MI, Wenwen YAO, Hongkun RUAN. Construction of a paper-assistant reading system based on machine reading comprehension[J]. Big data research, 2024, 10(4): 121-129.
秘蓉新, 姚文文, 阮宏坤. 基于机器阅读理解的论文辅助阅读系统构建[J]. 大数据, 2024,10(4):121-129. DOI: 10.11959/j.issn.2096-0271.2024039.
Rongxin MI, Wenwen YAO, Hongkun RUAN. Construction of a paper-assistant reading system based on machine reading comprehension[J]. Big data research, 2024, 10(4): 121-129. DOI: 10.11959/j.issn.2096-0271.2024039.
在信息化和数字化时代,科技论文数量的迅速增加带来了一系列问题,如论文冗长、信息提取困难、阅读时间成本居高不下等,研究者面临着更加烦琐、耗时的文献阅读挑战。通过语言模型落地创新,设计了科技论文辅助阅读系统来应对这些挑战。以机器阅读理解技术为核心,通过解析论文文本和预先设定问题,达到自动回答的效果。充分利用预训练语言模型PERT,提升系统对语义的理解和信息的提取能力,解决科技论文阅读过程中存在的各种问题,从而帮助读者提高科技文献阅读效率。
In the era of informatization and digitization
the rapid increase in the number of scientific papers has given rise to various challenges
such as lengthy articles
difficulty in information extraction and high time costs associated with reading.Literature reading challenges for researchers are increasingly tedious and time-consuming.By utilizing the language models
the assited reading system of scientific papers has been designed to address these challenges.By adopting machine reading comprehension technology as the core
the system parses scientific texts and offers some common questions to achieve automated response capabilities.By fully utilizing the pre-trained language model PERT
the system enhances its capabilities in semantic understanding and information extraction
effectively resolving various challenges in reading scientific papers and helping readers improve the efficiency of scientific literature review.
HERMANN K M , KOČISKÝ T , GREFENSTETTE E , et al . Teaching machines to read and comprehend [J ] // Advances in Neural Information Processing Systems . 2015 ( 1 ): 1693 - 1701 .
顾迎捷 , 桂小林 , 李德福 , 等 . 基于神经网络的机器阅读理解综述 [J ] . 软件学报 , 2020 , 31 ( 7 ): 2095 - 2126 .
GU Y J , GUI X L , LI D F , et al . Survey of machine reading comprehension based on neural network [J ] . Journal of Software , 2020 , 31 ( 7 ): 2095 - 2126 .
LIU S S , ZHANG X , ZHANG S , et al . Neural machine reading comprehension:methods and trends [J ] . Applied Sciences , 2019 , 9 ( 18 ): 3698 .
张少华 . 面向复杂文本的抽取式机器阅读理解研究 [D ] . 荆州:长江大学 , 2023 .
ZHANG S H . Research on extractive machine reading comprehension for complex textual corpus [D ] . Jingzhou:Yangtze University , 2023 .
DEVLIN J , CHANG M W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North,Minneapolisy . Minnesotd:Association for Computational Linguistics , 2019 .
LIU Y , OTT M , GOYAL N , et al . Roberta:a robustly optimized Bert pretraining approach [EB ] . arXiv preprint,2019,arXiv:1907.11692 .
LAN Z , CHEN M , GOODMAN S , et al . Albert:a lite Bert for selfsupervised learning of language representations [EB ] . arXiv preprint,2019,arXiv:1909.11942 .
CUI Y , CHE W , LIU T , et al . Revisiting pre-trained models for Chinese natural language processing [C ] // Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2020 . Stroudsburg:ACL , 2020 : 657 - 668 .
RADFORD A , NARASIMHAN K , SALIMANS T , et al . Improving language understanding by generative pretraining [EB ] . arXiv preprint,2018,arXiv:49313245 .
RADFORD A , WU J , CHILD R , et al . Language models are unsupervised multitask learners [J ] . OpenAI Blog , 2019 , 1 ( 8 ): 9 .
BROWN T B , MANN B , RYDER N , et al . Language models are few-shot learners [EB ] . arXiv preprint,2020,arXiv:2005.14165 .
卢经纬 , 郭超 , 戴星原 , 等 . 问答ChatGPT之后:超大预训练模型的机遇和挑战 [J ] . 自动化学报 , 2023 , 49 ( 4 ): 705 - 717 .
LU J W , GUO C , DAI X Y , et al . The ChatGPT after:opportunities and challenges of very large scale pre-trained models [J ] . Acta Automatica Sinica , 2023 , 49 ( 4 ): 705 - 717 .
CUI Y,YANG Z,LIU T,Pert:pretraining Bert with permuted language model [EB ] . arXiv preprint,2022,arXiv:2203.06906 .
CUI Y , LIU T , CHE W , et al . A spanextraction dataset for Chinese machine reading comprehension [C ] // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Stroudsburg:ACL , 2019 : 5883 - 5889 .
SHAO C C , LIU T , LAl Y T , et al . Drcd:a Chinese machine reading comprehension dataset [EB ] . arXiv preprint,2018,arXiv:1806.00920 .
WU Y , SCHUSTER M , CHEN Z , et al . Google’s neural machine translation system:bridging the gap between human and machine translation [EB ] . arXiv preprint,2016,arXiv:1609.08144 .
万小军 . 智能文本生成:进展与挑战 [J ] . 大数据 , 2023 , 9 ( 2 ): 99 - 109 .
WAN X J . Intelligent text generation:progress and challenges [J ] . Big Data Research , 2023 , 9 ( 2 ): 99 - 109 .
0
浏览量
83
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621