1. 华东师范大学图书馆,上海 200062
2. 华东师范大学数据科学与工程学院, 上海 200062
[ "于亚秀(1985- ),女,华东师范大学图书馆副研究馆员,主要研究方向为数字人文、知识组织与管理、智慧图书馆建设" ]
[ "李欣(1961- ),女,华东师范大学数据科学与工程学院研究馆员,主要研究方向为语义网知识组织与管理、数字人文、推荐系统" ]
网络首发:2022-11,
纸质出版:2022-11-15
移动端阅览
于亚秀, 李欣. 数字人文视域中的古籍文本标注方法研究——以MARKUS为例[J]. 大数据, 2022,8(6):15-25.
Yaxiu YU, Xin LI. Research on text annotation method of ancient works from the perspective of digital humanities:a case study on MARKUS[J]. Big data research, 2022, 8(6): 15-25.
于亚秀, 李欣. 数字人文视域中的古籍文本标注方法研究——以MARKUS为例[J]. 大数据, 2022,8(6):15-25. DOI: 10.11959/j.issn.2096-0271.2022046.
Yaxiu YU, Xin LI. Research on text annotation method of ancient works from the perspective of digital humanities:a case study on MARKUS[J]. Big data research, 2022, 8(6): 15-25. DOI: 10.11959/j.issn.2096-0271.2022046.
文本标注是文本分析挖掘中的重要一步,面对大规模古籍资源,人工标注无法满足人文研究需求,且古籍语法结构和语言特点特殊,现代文本标注技术很难直接用于古籍研究。在分析人文研究者进行古籍文本标注中面临的难点和痛点的基础上,提出普适性的古籍标注标准流程,给出基于MARKUS的文本标注模型,并通过具体实践,探索基于该模型的古籍文本标注方法,旨在助推借助数字人文工具改变古籍人文研究方式,拓宽研究规模的应用深度。
Text annotation is an important step in text analysis and mining.Manual labeling can no longer meet the needs of humanistic research faced with large-scale text resources
and due to the special grammatical structure and language characteristics of ancient works
the text annotation technology on modern corpora cannot be directly applied to the ancient works.Based on the analysis of the challenges faced by humanities researchers
a universal standard text annotation process of ancient works was proposed
and a model based on MARKUS was given.And ancient works annotation method based on this model through specific example was explored
to promote using tools to change the research methods in digital humanities and to expand the scale of research.
刘尚恒 . 古籍概念浅谈 [J ] . 图书馆工作与研究 , 1985 ( 2 ): 49 - 50 .
LIU S H . A note on concept of ancient works [J ] . Library Work and Study , 1985 ( 2 ): 49 - 50 .
杨琳 . 大陆古籍数字化的现状及存在的问题 [C ] // 中国古籍数字化国际学术研讨会论文集 . [出版地不详:出版者不详] , 2007 : 46 - 58 .
YANG L , . The present situation and existing problems of the digitization of ancient books in Mainland China [C ] // Proceedings of International Symposium on the Digitization of Chinese Ancient Books .[S.l.:s.n. ] , 2007 : 46 - 58 .
柯平 , 宫平 . 数字人文研究演化路径与热点领域分析 [J ] . 中国图书馆学报 , 2016 , 42 ( 6 ): 13 - 30 .
KE P , GONG P . The evolution path and hot topics of digital humanities research [J ] . Journal of Library Science in China , 2016 , 42 ( 6 ): 13 - 30 .
蔡莉 , 王淑婷 , 刘俊晖 , 等 . 数据标注研究综述 [J ] . 软件学报 , 2020 , 31 ( 2 ): 302 - 320 .
CAI L , WANG S T , LIU J H , et al . Survey of data annotation [J ] . Journal of Software , 2020 , 31 ( 2 ): 302 - 320 .
YANG H L , JIN L W , SUN J F . Recognition of Chinese text in historical documents with page-level annotations [C ] // Proceedings of 2018 16th International Conference on Frontiers in Handwriting Recognition . Piscataway:IEEE Press , 2018 : 199 - 204 .
郑永晓 , 段海蓉 . 古籍数字化、数字人文与古代文学研究:访中国社会科学院郑永晓教授 [J ] . 吉首大学学报(社会科学版) , 2020 , 41 ( 2 ): 144 - 151 .
ZHENG Y X , DUAN H R . Digitization of ancient books,digital humanities,and ancient Chinese literary research:an interview with professor zheng yongxiao from Chinese academy of social sciences [J ] . Journal of Jishou University (Social Sciences) , 2020 , 41 ( 2 ): 144 - 151 .
MORETTI F . Distant reading [M ] . London : Verso Books , 2013 : 211 - 221 .
苏祺 , 胡韧奋 , 诸雨辰 , 等 . 古籍数字化关键技术评述 [J ] . 数字人文研究 , 2021 , 1 ( 3 ): 83 - 88 .
SU Q , HU R F , ZHU Y C , et al . Key technologies for digitization of ancient Chinese books [J ] . Digital Humanities Research , 2021 , 1 ( 3 ): 83 - 88 .
谢韬 . 基于古文学的命名实体识别的研究与实现 [D ] . 北京:北京邮电大学 , 2018 .
XIE T . Research and implementation of named entity recognition based on ancient literature [D ] . Beijing:Beijing University of Posts and Telecommunications , 2018 .
马海丽 , 王曦 . 古籍数字化中计算机自然语言处理应用现状分析 [J ] . 古籍研究 , 2020 ( 2 ): 322 - 328 .
MA H L , WANG X . Analysis on application of computer natural language processing in digitization of ancient books [J ] . Reseaech on Chinese Ancient Book , 2020 ( 2 ): 322 - 328 .
钱智勇 , 周建忠 , 童国平 , 等 . 基于HMM的楚辞自动分词标注研究 [J ] . 图书情报工作 , 2014 , 58 ( 4 ): 105 - 110 .
QIAN Z Y , ZHOU J Z , TONG G P , et al . Research on automatic word segmentation and pos tagging for Chu ci based on HMM [J ] . Library and Information Service , 2014 , 58 ( 4 ): 105 - 110 .
张琪 , 江川 , 纪有书 , 等 . 面向多领域先秦典籍的分词词性一体化自动标注模型构建 [J ] . 数据分析与知识发现 , 2021 , 5 ( 3 ): 2 - 11 .
ZHANG Q , JIANG C , JI Y S , et al . Unified model for word segmentation and POS tagging of multi-domain pre-Qin literature [J ] . Data Analysis and Knowledge Discovery , 2021 , 5 ( 3 ): 2 - 11 .
YASUOKA K , . Universal dependencies treebank of the Four Books in classical Chinese [C ] // Proceedings of the 10th International Conference of Digital Archives and Digital Humanities .[S.l.:s.n. ] , 2019 : 20 - 28 .
邢付贵 , 朱廷劭 . 基于大规模语料库的古文词典构建及分词技术研究 [J ] . 中文信息学报 , 2021 , 35 ( 7 ): 41 - 46 .
XING F G , ZHU T S . Large-scale online corpus based classical integrated Chinese dictionary construction and word segmentation [J ] . Journal of Chinese Information Processing , 2021 , 35 ( 7 ): 41 - 46 .
邓三鸿 , 胡昊天 , 王昊 , 等 . 古文自动处理研究现状与新时代发展趋势展望 [J ] . 科技情报研究 , 2021 , 3 ( 1 ): 1 - 20 .
DENG S H , HU H T , WANG H , et al . Review of automatic processing of ancient Chinese character and prospects for its development trends in the new era [J ] . Scientific Information Research , 2021 , 3 ( 1 ): 1 - 20 .
欧阳剑 , 任树怀 . 数字人文研究中的古籍文本阅读可视化 [J ] . 图书馆杂志 , 2021 , 40 ( 4 ): 82 - 89 , 99 .
OUYANG J , REN S H . Visualization of ancient texts reading in digital humanities research [J ] . Library Journal , 2021 , 40 ( 4 ): 82 - 89 , 99 .
0
浏览量
589
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621