1. 湖南中医药大学信息科学与工程学院,湖南 长沙 410208
2. 湖南大学信息科学与工程学院,湖南 长沙 410082
3. 湘潭大学化学学院,湖南 湘潭 411105
4. 湖南泽塔科技有限公司,湖南 长沙 410012
5. 东北林业大学工程技术学院,黑龙江 哈尔滨 150040
6. 北京瑞迪弘欣科贸有限公司,北京 100071
[ "肖晓霞(1981- ),女,博士,湖南中医药大学信息科学与工程学院副教授,中国医药信息学会信息教育分会副秘书长,主要研究方向为中医智能辅助诊断、智能数据分析、嵌入式系统。" ]
[ "刘明婷(1999- ),女,湖南大学信息科学与工程学院硕士生,曾获第二届全国中医药院校人工智能创新创业大赛二等奖,主要研究方向为人工智能、生物信息。" ]
[ "杨冯天赐(1999- ),男,湘潭大学化学学院硕士生,曾获第三届全国中医药大学生程序设计竞赛银奖,第十五届和第十六届湖南省大学生计算机程序设计竞赛三等奖,第四届团体程序设计天梯赛湖南省二等奖、全国三等奖,主要研究方向为机器学习。" ]
[ "刘鉴建县(1998- ),男,湖南泽塔科技有限公司Python开发工程师,主要研究方向为人工智能、机器学习。" ]
[ "杨阳(2000- ),女,东北林业大学工程技术学院硕士生,主要研究方向为人工智能、机器学习。" ]
[ "石月(1998- ),女,北京瑞迪弘欣科贸有限公司商务经理助理。" ]
网络首发:2022-05,
纸质出版:2022-05-15
移动端阅览
肖晓霞, 刘明婷, 杨冯天赐, 等. 基于NLP的中医医案文本快速结构化方法[J]. 大数据, 2022,8(3):128-139.
Xiaoxia XIAO, Mingting LIU, Fengtianci YANG, et al. A fast text structuring methodology of TCM medical records based on NLP[J]. Big data research, 2022, 8(3): 128-139.
肖晓霞, 刘明婷, 杨冯天赐, 等. 基于NLP的中医医案文本快速结构化方法[J]. 大数据, 2022,8(3):128-139. DOI: 10.11959/j.issn.2096-0271.2022025.
Xiaoxia XIAO, Mingting LIU, Fengtianci YANG, et al. A fast text structuring methodology of TCM medical records based on NLP[J]. Big data research, 2022, 8(3): 128-139. DOI: 10.11959/j.issn.2096-0271.2022025.
中医医案是中医医生学习临床经验的重要文献资料,对中医医案进行结构化处理有利于采用机器学习等方法总结临床经验,加速中医传承。为了实现中医医案快速结构化,提出了一种基于自然语言处理的中医医案文本快速结构化方法。将《中国现代名中医医案精粹》作为结构化对象,采用光学字符识别技术识别医案截图的文本,同时对文本做初步结构化。构建简单症状词典,采用结合词典的改进的N-gram模型获取医案文本中的症状、体征等词,并在结构化过程中更新词典,实现了对4 754份文本医案的结构化。随机选取666份医案文本对最终模型进行测试,其F1值达到82.99%。
Traditional Chinese medicine (TCM) medical records are the most valuable documents for TCM doctors to learn clinical experience.The structured TCM medical records are conducive to extract the clinic knowledge based on machine learning and other methods
which can accelerate the inheritance of TCM.A fast text structuring methodology of TCM medical records based on natural language processing(NLP)was proposed to structure the clinic cases.Essence of Chinese Modern Famous Chinese Medical Records was selected as the medical record structuring objects
and the text in the screenshots of the medical records was recognized by optical character recognition (OCR) and the text was initially structured.A simple symptom dictionary was constructed
and the improved N-gram model combined with the dictionary was used to recognize the symptoms
signs and other words in the text
and the dictionary was updated in the structuring process.At last
4 754 text medical records were structured.The final model was test on 666 medical records selected randomly from the corpus
and its F1 value reached 82.99%.
滕文静 , 孙长岗 , 李雁 . 浅谈不同中医医案研究方法对临床思维建立的重要性 [J ] . 中华中医药杂志 , 2018 , 33 ( 3 ): 811 - 815 .
TENG W J , SUN C G , LI Y . Discussion on the importance about different research methods of traditional Chinese medicine cases to the establishment of clinical thinking [J ] . China Journal of Traditional Chinese Medicine and Pharmacy , 2018 , 33 ( 3 ): 811 - 815 .
肖晓霞 . 基于机器学习的中医临床症状数据元研究 [D ] . 长沙:湖南中医药大学 , 2018 .
XIAO X X . Research on clinical symptom data elements of traditional Chinese medicine based on machine learning [D ] . Changsha:Hunan University of Chinese Medicine , 2018 .
翟凤文 , 赫枫龄 , 左万利 . 字典与统计相结合的中文分词方法 [J ] . 小型微型计算机系统 , 2006 , 27 ( 9 ): 1766 - 1771 .
ZHAI F W , HE F L , ZUO W L . Chinese word segmentation based on dictionary and statistics [J ] . Journal of Chinese Computer Systems , 2006 , 27 ( 9 ): 1766 - 1771 .
蒋建洪 , 赵嵩正 , 罗玫 . 词典与统计方法结合的中文分词模型研究及应用 [J ] . 计算机工程与设计 , 2012 , 33 ( 1 ): 387 - 391 .
JIANG J H , ZHAO S Z , LUO M . Analysis and application of Chinese word segmentation model which consist of dictionary and statistics method [J ] . Computer Engineering and Design , 2012 , 33 ( 1 ): 387 - 391 .
唐琳 , 郭崇慧 , 陈静锋 . 中文分词技术研究综述 [J ] . 数据分析与知识发现 , 2020 , 4 ( S1 ): 1 - 17 .
TANG L , GUO C H , CHEN J F . Review of Chinese word segmentation studies [J ] . Data Analysis and Knowledge Discovery , 2020 , 4 ( S1 ): 1 - 17 .
何晗 . 自然语言处理入门 [M ] . 北京 : 人民邮电出版社 , 2019 .
HE H . Introduction natural language processing [M ] . Beijing : Posts &Telecom Press , 2019 .
张帆 , 刘晓峰 , 孙燕 . 中医医案文献自动分词研究 [J ] . 中国中医药信息杂志 , 2015 , 22 ( 2 ): 38 - 41 .
ZHANG F , LIU X F , SUN Y . Study on automatic word segmentation for traditional Chinese medical record literature [J ] . Chinese Journal of Information on Traditional Chinese Medicine , 2015 , 22 ( 2 ): 38 - 41 .
李明浩 , 刘忠 , 姚远哲 . 基于LSTM-CRF的中医医案症状术语识别 [J ] . 计算机应用 , 2018 , 38 ( S2 ): 42 - 46 .
LI M H , LIU Z , YAO Y Z . LSTM-CRF based symptom term recognition on traditional Chinese medical case [J ] . Journal of Computer Applications , 2018 , 38 ( S2 ): 42 - 46 .
王永炎 , 陶广正 . 中国现代名中医医案精粹第5集 [M ] . 北京 : 人民卫生出版社 , 2010 .
WANG Y Y , TAO G Z . The essence of medical record from famous modern TCM doctor-episode 5 [M ] . Beijing : People’s Medical Publishing House , 2010 .
MARTIN J H . Speech and language processing:an introduction to natural language processing,computational linguistics,and speech recognition [M ] . Upper Saddle River : Pearson/Prentice Hall , 2009 : 30 - 31 .
吴旭东 . 正向最大匹配分词算法的分析与改进 [J ] . 科技传播 , 2011 , 3 ( 20 ): 164 - 165 .
WU X D . Analysis and improvement of forward maximum matching word segmentation algorithm [J ] . Public Communication of Science & Technology , 2011 , 3 ( 20 ): 164 - 165 .
李灿东 . 中医诊断学(新世纪第4版) [M ] . 北京 : 中国中医药出版社 , 2016 .
LI C D . Diagnostics of traditional Chinese medicine (new century 4th edition) [M ] . Beijing : China Press of Traditional chinese Medicine , 2016 .
成战鹰 , 王肖龙 . 诊断学基础(第2版) [M ] . 北京 : 人民卫生出版社 , 2016 .
CHENG Z Y , WANG X L . Fundamentals of diagnostics (2nd edition) [M ] . Beijing : People’s Medical Publishing House , 2016 .
BIRD S , KLEIN E , LOPER E . Natural language processing with Python [M ] . California : O’Reilly Media Inc. , 2009 .
0
浏览量
798
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621