1. 西安交通大学软件学院,陕西 西安 710049
2. 陕西省天地网技术重点实验室,陕西 西安 710049
3. 西安交通大学继续教育学院,陕西 西安 710049
4. 西安交通大学计算机科学与技术学院,陕西 西安 710049
[ "关海山(1996- ),男,西安交通大学软件学院硕士生,主要研究方向为自然语言处理、文本问题生成" ]
[ "郑玉龙(1996- ),男,西安交通大学软件学院硕士生,主要研究方向为自然语言处理、文本问题生成" ]
[ "魏笔凡(1977- ),男,博士,西安交通大学继续教育学院研究员,主要研究方向为Web信息抽取、教育知识图谱构建及应用" ]
[ "张泽民(1999- ),男,西安交通大学软件学院硕士生,主要研究方向为自然语言处理、深度学习、问题生成等" ]
[ "岳浩(1999- ),男,西安交通大学软件学院硕士生,主要研究方向为大数据、深度学习等" ]
[ "师斌(1992- ),男,博士,西安交通大学计算机科学与技术学院讲师,主要研究方向为金融数据挖掘、云计算及虚拟化技术" ]
[ "董博(1983- ),男,博士,西安交通大学计算机科学与技术学院高级工程师,主要研究方向为金融数据挖掘、智能教育" ]
网络首发:2022-09,
纸质出版:2022-09-15
移动端阅览
关海山, 郑玉龙, 魏笔凡, 等. 税收优惠政策关键要素抽取与可视化分析[J]. 大数据, 2022,8(5):106-123.
Haishan GUAN, Yulong ZHENG, Bifan WEI, et al. Extraction and visualization analysis of key elements of tax preferential policies[J]. Big data research, 2022, 8(5): 106-123.
关海山, 郑玉龙, 魏笔凡, 等. 税收优惠政策关键要素抽取与可视化分析[J]. 大数据, 2022,8(5):106-123. DOI: 10.11959/j.issn.2096-0271.2022035.
Haishan GUAN, Yulong ZHENG, Bifan WEI, et al. Extraction and visualization analysis of key elements of tax preferential policies[J]. Big data research, 2022, 8(5): 106-123. DOI: 10.11959/j.issn.2096-0271.2022035.
随着税收优惠政策数量的迅速增加,纳税人面对海量的税收优惠政策难以快速定位与自身相关的优惠内容,导致许多纳税人没有享受到应该享受的优惠政策。基于预训练语言模型BERT与规则处理相结合的方法实现了对税收优惠政策法规的表示、关键要素抽取和税收优惠的可视化查询,使纳税人可以快速准确地定位与自身相关的税收优惠信息,并对结果进行可视化展示。实验结果表明,关键要素抽取性能优越,税收优惠政策查询快速直观,可有效缓解海量税收优惠信息过载。
With the rapid increase in the number of preferential tax policies
taxpayers face a large number of preferential tax policies
and it is difficult for taxpayers to quickly locate the preferential content related to them.As a result
many taxpayers do not enjoy the preferential policies they should enjoy.Based on the combination of pre-training language model BERT and rule processing
the visualization was realized of the characterization of preferential tax policies and regulations
the extraction of key elements
and the visual query of tax incentives
so that taxpayers can intuitively and quickly locate tax incentives related to themselves
and visualize the results.The experimental results show that the extraction performance of key elements is superior
and the query of preferential tax policies is quick and intuitive
which can effectively alleviate the problem of massive tax preferential information overload.
汪疆平 , 肖戎 . 税务大数据分析的技术和典型应用 [J ] . 大数据 , 2017 , 3 ( 2 ): 92 - 103 .
WANG J P , XIAO R . Big data analysis technology and application on taxation [J ] . Big Data Research , 2017 , 3 ( 2 ): 92 - 103 .
邵凌云 . 基于纳税人需求 优化纳税服务机制 [J ] . 税务研究 , 2013 ( 5 ): 76 - 79 .
SHAO L Y . Optimize the tax service mechanism based on the demand of taxpayers [J ] . Taxation Research , 2013 ( 5 ): 76 - 79 .
谢学刚 , 苟仁金 . 提升纳税服务质量的现实选择 [J ] . 税务研究 , 2014 ( 11 ): 96 .
XIE X G , GOU R J . A realistic choice to improve the quality of tax services [J ] . Taxation Research , 2014 ( 11 ): 96 .
谢波峰 . 基于大数据的税收经济分析和预测探索 [J ] . 大数据 , 2017 , 3 ( 3 ): 15 - 24 .
XIE B F . Exploratory research on big data application of analysis and forecasting in economics of tax [J ] . Big Data Research , 2017 , 3 ( 3 ): 15 - 24 .
ZHANG R X , YANG W , LIN L Y , et al . Rapid adaptation of bert for information extraction on domain-specific business documents [J ] . arXiv preprint,2020,arXiv:2002.01861 .
NGUYEN M T , LE D T , LINH L T , et al . AURORA:an information extraction system of domainspecific business documents with limited data [C ] // Proceedings of the 29th ACM International Conference on Information & Knowledge Management . New York:ACM Press , 2020 : 3437 - 3440 .
FRIEDRICH A , ADEL H , TOMAZIC F , et al . The SOFC-exp corpus and neural approaches to information extraction in the materials science domain [C ] // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Stroudsburg:Association for Computational Linguistics , 2020 : 1255 - 1268 .
ZEGHDAOUI M W , BOUSSAID O , BENTAYEB F , et al . Medical-based text classification using FastText features and CNN-LSTM model [C ] // Database and Expert Systems Applications . Cham:Springer , 2021 : 155 - 167 .
BRANDES U , KENIS P , WAGNER D . Communicating centrality in policy network drawings [J ] . IEEE Transactions on Visualization and Computer Graphics , 2003 , 9 ( 2 ): 241 - 253 .
BRANDES U , PICH C . More flexible radial layout [J ] . Journal of Graph Algorithms and Applications , 2011 , 15 ( 1 ): 157 - 173 .
RAJ M , WHITAKER R T . Anisotropic radial layout for visualizing centrality and structure in graphs [C ] // Graph Drawing and Network Visualization .[S.l.:s.n. ] , 2018 : 351 - 364 .
FENU G , SPANO L D . Recommendation Centre:inspecting and controlling recommendations with radial layouts [C ] // Workshop on Engineering Computer-Human Interaction in Recommender Systems .[S.l.:s.n. ] , 2016 : 54 - 61 .
BOSTOCK M , HEER J . Protovis:a graphical toolkit for visualization [J ] . IEEE Transactions on Visualization and Computer Graphics , 2009 , 15 ( 6 ): 1121 - 1128 .
BOSTOCK M , OGIEVETSKY V , HEER J . D³ data-driven documents [J ] . IEEE Transactions on Visualization and Computer Graphics , 2011 , 17 ( 12 ): 2301 - 2309 .
LI D Q , MEI H H , SHEN Y , et al . ECharts:a declarative framework for rapid construction of web-based visualization [J ] . Visual Informatics , 2018 , 2 ( 2 ): 136 - 146 .
DEVLIN J , CHANG M W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [J ] . Proceedings of NAACLHLT , 2019 : 4171 - 4186 .
MCCALLUM A , LI W . Early results for named entity recognition with conditional random fields,feature induction and web-enhanced lexicons [C ] // Proceedings of the 7th Conference on Natural Language Learning at HLTNAACL 2003 . Morristown:Association for Computational Linguistics , 2003 : 188 - 191 .
HUANG Z H , XU W , YU K . Bidirectional LSTM-CRF models for sequence tagging [J ] . arXiv preprint,2015,arXiv:1508.01991 .
QIU X P , SUN T X , XU Y G , et al . Pretrained models for natural language processing:a survey [J ] . Science China Technological Sciences , 2020 , 63 ( 10 ): 1872 - 1897 .
0
浏览量
577
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621