[ "崔雨萌(1998- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为命名实体识别" ]
[ "王靖亚(1966- ),女,中国人民公安大学信息网络安全学院教授,主要研究方向为自然语言处理、样本对抗" ]
[ "闫尚义(1998- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为自然语言处理、文本分类" ]
[ "陶知众(1997- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为人工智能、图像风格转换" ]
网络首发:2022-11,
纸质出版:2022-11-15
移动端阅览
崔雨萌, 王靖亚, 闫尚义, 等. 基于深度学习的警情记录关键信息自动抽取[J]. 大数据, 2022,8(6):127-142.
Yumeng CUI, Jingya WANG, Shangyi YAN, et al. Automatic key information extraction of police records based on deep learning[J]. Big data research, 2022, 8(6): 127-142.
崔雨萌, 王靖亚, 闫尚义, 等. 基于深度学习的警情记录关键信息自动抽取[J]. 大数据, 2022,8(6):127-142. DOI: 10.11959/j.issn.2096-0271.2022052.
Yumeng CUI, Jingya WANG, Shangyi YAN, et al. Automatic key information extraction of police records based on deep learning[J]. Big data research, 2022, 8(6): 127-142. DOI: 10.11959/j.issn.2096-0271.2022052.
随着智慧警务的兴起,民众报警渠道拓宽,非结构化警情激增,警情实体识别难度增大。针对这一业务痛点,引入BERT模型获取词向量,融合自注意力机制来捕获文字之间的长距离依赖关系,并构建BERTBiGRU-SelfAtt-CRF警情实体识别模型。为了验证模型的性能和泛化能力,在公开数据集上进行了实验。为了验证模型在警情领域的可行性和效率,在构建的警情数据集上进行了实验。实验结果表明,提出的模型在警情数据集上的精确率达到了82.45%,召回率达到了79.03%,F1值达到了80.72%,优于其他模型。可见,提出的模型可以满足实际公安工作需要,是可行、有效的。
With the emergence of intelligent policing
the channels of mass to call police are widened
unstructured police records increase immensely
and the difficulty of police entity recognition is magnified.For this pain point
BERT model was introduced to generate the word vector
the self-attention mechanism was integrated to capture the long-distance dependence between words
and the BERT-BiGRU-SelfAtt-CRF police entity recognition model was constructed.In order to verify the performance and generalization ability of this model
experiments were carried out on public datasets.And to prove the feasibility and efficiency of this model in the police field
experiments were run on the annotated police dataset.Ultimately
the results showed that BERT-BiGRU-SelfAtt-CRF model outperformed other models on the police dataset
with the precision of 82.45%
recall rate of 79.03%
and F1 value of 80.72%.It is concluded that this model can meet the requirements of actual police work
and it is feasible and effective in the field of police entity recognition.
张晓艳 , 王挺 , 陈火旺 . 命名实体识别研究 [J ] . 计算机科学 , 2005 , 32 ( 4 ): 44 - 48 .
ZHANG X Y , WANG T , CHEN H W . Research on named entity recognition [J ] . Computer Science , 2005 , 32 ( 4 ): 44 - 48 .
何玉洁 , 杜方 , 史英杰 , 等 . 基于深度学习的命名实体识别研究综述 [J ] . 计算机工程与应用 , 2021 , 57 ( 11 ): 21 - 36 .
HE Y J , DU F , SHI Y J , et al . Survey of named entity recognition based on deep learning [J ] . Computer Engineering and Applications , 2021 , 57 ( 11 ): 21 - 36 .
王月 , 王孟轩 , 张胜 , 等 . 基于BERT的警情文本命名实体识别 [J ] . 计算机应用 , 2020 , 40 ( 2 ): 535 - 540 .
WANG Y , WANG M X , ZHANG S , et al . Alarm text named entity recognition based on BERT [J ] . Journal of Computer Applications , 2020 , 40 ( 2 ): 535 - 540 .
ISOZAKI H , KAZAWA H . Efficient support vector classifiers for named entity recognition [C ] // Proceedings of the 19th International Conference on Computational Linguistics . Morristown:Association for Computational Linguistics , 2002 .
LIU K X , HU Q C , LIU J W , et al . Named entity recognition in Chinese electronic medical records based on CRF [C ] // Proceedings of 2017 14th Web Information Systems and Applications Conference . Piscataway:IEEE Press , 2017 : 105 - 110 .
HAN A L F , WONG D F , CHAO L S . Chinese named entity recognition with conditional random fields in the light of Chinese characteristics [C ] // Proceedings of the Language Processing and Intelligent Information Systems .[S.l.:s.n. ] , 2013 : 57 - 68 .
MORWAL S . Named entity recognition using hidden Markov model (HMM) [J ] . International Journal on Natural Language Computing , 2012 , 1 ( 4 ): 15 - 23 .
FU G H , LUKE K K . Chinese named entity recognition using lexicalized HMMs [J ] . ACM SIGKDD Explorations Newsletter , 2005 , 7 ( 1 ): 19 - 25 .
BENDER O , OCH F J , NEY H . Maximum entropy models for named entity recognition [C ] // Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 . Morristown:Association for Computational Linguistics , 2003 : 148 - 151 .
CHIEU H L , NG H T . Named entity recognition:a maximum entropy approach using global information [C ] // Proceedings of the 19th International Conference on Computational Linguistics . Morristown:Association for Computational Linguistics , 2002 .
吴超 , 王汉军 . 基于GRU的电力调度领域命名实体识别方法 [J ] . 计算机系统应用 , 2020 , 29 ( 8 ): 185 - 191 .
WU C , WANG H J . Named entity recognition in electric power dispatching field based on GRU [J ] . Computer Systems& Applications , 2020 , 29 ( 8 ): 185 - 191 .
DONG C H , WU H J , ZHANG J J , et al . Multichannel LSTM-CRF for named entity recognition in Chinese social media [C ] // Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data .[S.l.:s.n. ] , 2017 : 197 - 208 .
WU F Z , LIU J X , WU C H , et al . Neural Chinese named entity recognition via CNNLSTM-CRF and joint training with word segmentation [C ] // Proceedings of World Wide Web Conference (WWW 2019) . New York:ACM Press , 2019 : 3342 - 3348 .
DONG C H , ZHANG J J , ZONG C Q , et al . Character-based LSTM-CRF with radical-level features for Chinese named entity recognition [C ] // Proceedings of the Natural Language Understanding and Intelligent Applications .[S.l.:s.n. ] , 2016 : 239 - 250 .
TANG B Z , WANG X L , YAN J , et al . Entity recognition in Chinese clinical text using attention-based CNN-LSTMCRF [J ] . BMC Medical Informatics and Decision Making , 2019 , 19 ( Suppl 3 ): 74 .
HUANG Z H , XU W , YU K . Bidirectional LSTM-CRF models for sequence tagging [J ] . arXiv preprint,2015,arXiv:1508.01991 .
CHEN Y , ZHOU C J , LI T X , et al . Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training [J ] . Journal of Biomedical Informatics , 2019 ,96:103252.
李一斌 , 张欢欢 . 基于双向GRU-CRF的中文包装产品实体识别 [J ] . 华东理工大学学报(自然科学版) , 2019 , 45 ( 3 ): 486 - 490 .
LI Y B , ZHANG H H . Chinese packaging product entity recognition based on bidirectional GRU-CRF [J ] . Journal of East China University of Science and Technology , 2019 , 45 ( 3 ): 486 - 490 .
WU G H , TANG G G , WANG Z R , et al . An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition [J ] . IEEE Access , 2019 , 7 : 113942 - 113949 .
ZHONG Q , TANG Y . An attention-based BILSTM-CRF for Chinese named entity recognition [C ] // Proceedings of 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics . Piscataway:IEEE Press , 2020 : 550 - 555 .
MIKOLOV T , SUTSKEVER I , CHEN K , et al . Distributed representations of words and phrases and their compositionality [C ] // Proceedings of the Advances in Neural Information Processing Systems .[S.l.:s.n. ] , 2013 : 3111 - 3119 .
DEVLIN J , CHANG M.W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [J ] . arXiv preprint,2018,arXiv:1810.04805 .
LI X Y , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods [J ] . Journal of Biomedical Informatics , 2020 ,107:103422.
尹学振 , 赵慧 , 赵俊保 , 等 . 多神经网络协作的军事领域命名实体识别 [J ] . 清华大学学报(自然科学版) , 2020 , 60 ( 8 ): 648 - 655 .
YIN X Z , ZHAO H , ZHAO J B , et al . Multi-neural network collaboration for Chinese military named entity recognition [J ] . Journal of Tsinghua University (Science and Technology) , 2020 , 60 ( 8 ): 648 - 655 .
GU L , ZHANG W J , WANG Y , et al . Named entity recognition in judicial field based on BERT-BiLSTM-CRF model [C ] // Proceedings of 2020 International Workshop on Electronic Communication and Artificial Intelligence . Piscataway:IEEE Press , 2020 : 170 - 174 .
NIE Y Y , TIAN Y H , WAN X , et al . Named entity recognition for social media texts with semantic augmentation [C ] // Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing . Stroudsburg:Association for Computational Linguistics , 2020 : 1383 - 1391 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the Advances in Neural Information Processing Systems .[S.l.:s.n. ] , 2017 : 5998 - 6008 .
CHO K , VAN MERRIENBOER B , GULCEHRE C , et al . Learning phrase representations using RNN encoder–decoder for statistical machine translation [C ] // Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing . Stroudsburg:Association for Computational Linguistics , 2014 : 1724 - 1734 .
BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate [J ] . arXiv preprint,2018,arXiv:1409.0473 .
LAFFERTY J , MCCALLUM A , PEREIRA F . Conditional random fields:probabilistic models for segmenting and labeling sequence data [C ] // Proceedings of the 18th International Conference on Machine Learning .[S.l.:s.n. ] , 2001 , 3 ( 2 ): 282 - 289 .
GINA A L , . The third international Chinese language processing bakeoff:word segmentation and named entity recognition [C ] // Proceedings of the 5th SIGHAN Workshop on Chinese Language Proceeding .[S.l.:s.n. ] , 2006 : 548 - 554 .
0
浏览量
304
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621