1. 中国共产党中央军事委员会后勤保障部信息中心,北京 100190
2. 长沙军民先进技术研究有限公司,湖南 长沙 410205
[ "刘聪(1985- ),男,博士,中国共产党中央军事委员会后勤保障部信息中心工程师,主要研究方向为医疗卫生大数据、医疗卫生信息化" ]
[ "吕雪峰(1979- ),男,中国共产党中央军事委员会后勤保障部信息中心高级工程师,主要研究方向为医疗卫生大数据、医疗卫生信息化" ]
[ "王宏林(1988- ),男,中国共产党中央军事委员会后勤保障部信息中心工程师,主要研究方向为后勤信息化" ]
[ "王晓伟(1980- ),男,博士,长沙军民先进技术研究有限公司高级工程师,主要研究方向为自然语言处理、大数据" ]
[ "陆瑾(1993- ),男,长沙军民先进技术研究有限公司工程师,主要研究方向为自然语言处理、人工智能" ]
[ "孙顺(1980- ),男,中国共产党中央军事委员会后勤保障部信息中心工程师,主要研究方向为卫生信息化" ]
[ "胡松奇(1988- ),男,中国共产党中央军事委员会后勤保障部信息中心工程师,主要研究方向为卫生信息化" ]
网络首发:2023-07,
纸质出版:2023-07-15
移动端阅览
刘聪, 吕雪峰, 王宏林, 等. 基于概率分布差异的医学命名实体识别方法[J]. 大数据, 2023,9(4):159-171.
Cong LIU, Xuefeng LYU, Honglin WANG, et al. Medical named entity recognition algorithm based on probability distribution difference[J]. Big data research, 2023, 9(4): 159-171.
刘聪, 吕雪峰, 王宏林, 等. 基于概率分布差异的医学命名实体识别方法[J]. 大数据, 2023,9(4):159-171. DOI: 10.11959/j.issn.2096-0271.2023008.
Cong LIU, Xuefeng LYU, Honglin WANG, et al. Medical named entity recognition algorithm based on probability distribution difference[J]. Big data research, 2023, 9(4): 159-171. DOI: 10.11959/j.issn.2096-0271.2023008.
医学命名实体识别是从医学文本中抽取出指代特定概念的医学实体,是医学信息抽取的基础性任务。当前主流的医学命名实体识别算法普遍基于深度学习技术,需要大量高质量的标注样本进行模型训练。然而医学领域的样本标注成本很高,严重限制了模型性能的提升。为了降低模型对标注样本的需求,一种重要的方法是基于主动学习思想,设计合理的样本采样策略,自动选取高价值样本优先标注,从而使模型提前收敛。现有算法普遍基于样本长度、样本识别的概率等特征来设计采样策略,忽视了样本类别分布这一深层次特征,导致命名实体识别召回率较低。提出了一种基于概率分布差异的主动学习算法,通过计算样本间的概率分布差异来评估样本的标注价值,并在标注样本更新时动态优化模型。在真实的医学检查文本上的实验表明,相比已有算法,达到同等的模型性能,该算法所需要的标注数据可缩减10%以上;在相同标注样本量的情况下,本算法F1值提高5%以上。
With the improvement of data abilities and the development of emerging technologies
there are profound changes occurring in economic patterns and competitive structure of industries.In order to better respond to future opportunities and challenges
and to improve competitiveness of enterprises in new situations
it is necessary to understand and master the knowledge of digital transformation.The new competitive situation was discussed in which traditional enterprises would gradually be replaced by digital-transformed ones
digital transformation was differentiated from digitalization.Main challenges facing traditional enterprises while undergoing digital transformation were pinpointed
which were the lack of funds
talents
data and consciousness.A digital transformation service platform oriented to new competitive situation was proposed
which provided a feasible solution to enhancing enterprise competitiveness and conducting digital transformation.
杨威 , 刘艳如 , 孟颖 , 等 . 浅谈临床医学术语的标准化管理 [J ] . 中国卫生标准管理 , 2021 , 12 ( 12 ): 1 - 4 .
YANG W , LIU Y R , MENG Y , et al . Discussion on standardization management of clinical medical terminology [J ] . China Health Standard Management , 2021 , 12 ( 12 ): 1 - 4 .
赵嘉莹 , 高鹏 , 朱勇俊 , 等 . 人工智能的应用将改进中国基层医疗卫生服务效能 [J ] . 中国全科医学 , 2017 , 20 ( 34 ): 4219 - 4223 .
ZHAO J Y , GAO P , ZHU Y J , et al . The application of artificial intelligence could improve primary health care provision in China [J ] . Chinese General Practice , 2017 , 20 ( 24 ): 4219 - 4223 .
曾晓天 , 徐春园 , 张勇 , 等 . 人工智能在医学大数据标准化体系建设中的研究进展 [J ] . 北京生物医学工程 , 2019 , 38 ( 6 ): 639 - 643 .
ZENG X T , XU C Y , ZHANG Y , et al . Research progress on artificial intelligence in the standardization system construction of medical big data [J ] . Beijing Biomedical Engineering , 2019 , 38 ( 6 ): 640 - 644 .
郑强 , 刘齐军 , 王正华 , 等 . 生物医学命名实体识别的研究与进展 [J ] . 计算机应用研究 , 2010 , 27 ( 3 ): 811 - 815 , 832 .
ZHENG Q , LIU Q J , WANG Z H , et al . Research and development on biomedical named entity recognition [J ] . Application Research of Computers , 2010 , 27 ( 3 ): 811 - 815 , 832 .
SETTLES B . Active learning literature survey [J ] . Machine Learning , 2010 , 15 ( 2 ): 201 - 221 .
HANISCH D , FUNDEL K , MEVISSEN H T , et al . ProMiner:rule-based protein and gene entity recognition [J ] . BMC Bioinformatics , 2005 , 6 ( Suppl 1 ): S14 .
刘一佳 , 车万翔 , 刘挺 , 等 . 基于序列标注的中文分词,词性标注模型比较分析 [C ] // 第六届全国青年计算语言学会议论文集 . [出版者不详:出版地不详] , 2012 : 26 - 34 .
LIU Y J , CHE W X , LIU T , et al . A comparison study of sequence labeling methods for Chinese word segmentation,POS tagging models [C ] // The 6th Youth Conference of Computational Linguistics .[S.l.:s.n. ] , 2012 : 26 - 34 .
王浩畅 , 赵铁军 . 基于SVM的生物医学命名实体的识别 [J ] . 哈尔滨工程大学学报 , 2006 , 27 ( S1 ): 570 - 574 .
WANG H C , ZHAO T J . SVM-based biomedical Name entity recognition [J ] . Journal of Harbin Engineering University , 2006 , 27 ( S1 ): 570 - 574 .
MORWAL S , CHOPRA D . NERHMM:a tool for named entity recognition based on hidden Markov model [J ] . International Journal on Natural Language Computing , 2013 , 2 ( 2 ): 43 - 49 .
PATIL N , PATIL A , PAWAR B V . Named entity recognition using conditional random fields [J ] . Procedia Computer Science , 2020 , 167 : 1181 - 1188 .
LAMPLE G , BALLESTEROS M , SUBRAMANIAN S , et al . Neural architectures for named entity recognition [C ] // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies . Stroudsburg:Association for Computational Linguistics , 2016 .
OUYANG E , LI Y X , JIN L , et al . Exploring N-gram character presentation in bidirectional RNN-CRF for Chinese clinical named entity recognition [C ] // Proceedings of China Conference on Knowledge Graph and Semantic Computing 2017 .[S.l.:s.n. ] , 2017 .
DONG X S , CHOWDHURY S , QIAN L J , et al . Transfer bi-directional LSTM RNN for named entity recognition in Chinese electronic medical records [C ] // Proceedings of 2017 IEEE 19th International Conference on e-Health Networking,Applications and Services . Piscataway:IEEE Press , 2017 : 1 - 4 .
ZHANG Z C , ZHANG Y , ZHOU T . Medical knowledge attention enhanced neural model for named entity recognition in Chinese EMR [C ] // Proceedings of China National Conference on Chinese Computational Linguistics,International Symposium on Natural Language Processing Based on Naturally Annotated Big Data . Cham:Springer , 2018 : 376 - 385 .
WANG Q , XIA Y H , ZHOU Y M , et al . Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition [J ] . Journal of Biomedical Informatics , 2019 ,92:103133.
QIU J H , WANG Q , ZHOU Y M , et al . Fast and accurate recognition of Chinese clinical named entities with residual dilated convolutions [C ] // Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) . Piscataway:IEEE Press , 2019 : 935 - 942 .
LI X Y , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods [J ] . Journal of Biomedical Informatics , 2020 ,107:103422.
张岑芳 . 基于主动学习的命名实体识别算法 [J ] . 计算机与现代化 , 2021 ( 7 ): 18 - 22 .
ZHANG C F . Named entity recognition algorithm based on active learning [J ] . Computer and Modernization , 2021 ( 7 ): 18 - 22 .
卢宁杰 . 结合主动学习的中文医疗命名实体识别研究 [D ] . 上海:华东师范大学 , 2020 .
LU N J . Research on Chinese medical named entity recognition combined with active learning [D ] . Shanghai:East China Normal University , 2020 .
SHANNON C E . A mathematical theory of communication [J ] . Bell System Technical Journal , 1948 , 27 ( 4 ): 623 - 656 .
LEWIS D D , CATLETT J . Heterogeneous uncertainty sampling for supervised learning [M ] // Machine learning proceedings 1994 . Amsterdam : Elsevier , 1994 : 148 - 156 .
SCHEFFER T , DECOMAIN C , WROBEL S . Active hidden Markov models for information extraction [M ] // Advances in intelligent data analysis . Heidelberg : Springer , 2001 : 309 - 318 .
DEVLIN J , CHANG M , LEE K , et al . BERT:pre-training of deep bidirectional transformers for language understanding [J ] . arXiv preprint . 2018 :arXiv:1810.04805.
GRAVES A , SCHMIDHUBER J . Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J ] . Neural Networks , 2005 , 18 ( 5/6 ): 602 - 610 .
SUTTON C . An introduction to conditional random fields [J ] . Foundations and Trends ® in Machine Learning , 2012 , 4 ( 4 ): 267 - 373 .
KINGMA D P , BA J . Adam:a method for stochastic optimization [J ] . arXiv preprint,2014,arXiv:1412 . 6980 .
ZAN H Y , LI W X , ZHANG K L , et al . Building a pediatric medical corpus:word segmentation and named entity annotation [M ] // Lecture notes in computer science . Cham : Springer , 2021 : 652 - 664 .
LAN Z , CHEN M , GOODMAN S , et al . ALBERT:a lite BERT for self-supervised learning of language representations [J ] . arXiv preprint , 2019 ,arXiv:1909.11942.
DIAO S Z , BAI J X , SONG Y , et al . ZEN:pre-training Chinese text encoder enhanced by N-gram representations [C ] // Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2020 . Stroudsburg:Association for Computational Linguistics , 2020 .
0
浏览量
464
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621