1. 南京师范大学文学院,江苏 南京 210097
2. 南京农业大学信息管理学院,江苏 南京 210095
[ "郑童哲恒(1998- ),女,南京师范大学文学院硕士生,主要研究方向为计算语言学、数字人文" ]
[ "李斌(1981- ),男,南京师范大学文学院副教授,主要研究方向为计算语言学、数字人文" ]
[ "冯敏萱(1978- ),女,南京师范大学文学院副教授,主要研究方向为语言信息处理、语料库语言学、数字人文" ]
[ "常博林(1999- ),男,南京师范大学文学院本科生,主要研究方向为数字人文、计算语言学、语料库语言学" ]
[ "王东波(1981- ),男,南京农业大学信息管理学院教授、博士生导师,主要研究方向为信息智能处理、自然语言处理" ]
网络首发:2022-11,
纸质出版:2022-11-15
移动端阅览
郑童哲恒, 李斌, 冯敏萱, 等. 历史典籍的结构化探索——《史记·列传》数字人文知识库的构建与可视化研究[J]. 大数据, 2022,8(6):40-55.
Tongzheheng ZHENG, Bin LI, Minxuan FENG, et al. Explore the structuration of historical books:the construction and quantitative analysis of digital humanities database of the Biographies of the Shiji[J]. Big data research, 2022, 8(6): 40-55.
郑童哲恒, 李斌, 冯敏萱, 等. 历史典籍的结构化探索——《史记·列传》数字人文知识库的构建与可视化研究[J]. 大数据, 2022,8(6):40-55. DOI: 10.11959/j.issn.2096-0271.2022067.
Tongzheheng ZHENG, Bin LI, Minxuan FENG, et al. Explore the structuration of historical books:the construction and quantitative analysis of digital humanities database of the Biographies of the Shiji[J]. Big data research, 2022, 8(6): 40-55. DOI: 10.11959/j.issn.2096-0271.2022067.
中国古代典籍文献浩如烟海,蕴藏了大量的历史人文知识。以电子化和全文检索为主要方法的古籍数字化开发应用模式已经成为语言文学、历史、哲学等学科的重要基础资源和工具。随着人工智能与大数据技术的发展,数字人文的研究范式不断演进,将传统典籍的文本转换为高度结构化的新型数字人文数据库是一项新的探索,将文本中词汇、人物、地理实体等要素有机组织起来,对于历史现象可视化、历史规律量化具有重大意义。以《史记·列传》为对象,进行古汉语自动分词及词性标注、人工校对以及实体信息人工标注,形成多层次、高质量的数字人文知识库,实现包含古籍词汇、人物、地点等要素的定量分析与可视化检索,挖掘出《史记·列传》人物和地点分布情况、人物关系、人地关系等信息。得出:《史记·列传》共出现人物1 787位、地点1 173个;相比《史记·本纪》和《史记·世家》,《史记·列传》特有人物共1 092位,特有地点共556个。本文研究内容为古籍数字人文知识库的构建提供了新的思路与框架。
Ancient Chinese classical books are vast and contain a lot of historical and humanistic knowledge.The development and application mode of the digitization of ancient books based on digitization and full-text retrieval has become an important basic resource and tool for language and literature
history
philosophy and other disciplines.With the development of artificial intelligence and big data technology
the research paradigm of digital humanities is constantly evolving.It is a new exploration to convert the text of traditional books into a highly structured new digital humanities database.Organizing elements such as words
characters
and geographical entities in the text organically is of great significance for the visualization of historical knowledge and the quantification of historical information.The Biographies of the Shiji was selected as the object.The automatic word segmentation and part-of-speech tagging
manual proofreading and manual annotation of entity information were performed to construct a multi-level and high-quality structured digital humanities knowledge base
realize quantitative analysis and visual retrieval of elements
such as words
characters and locations of ancient books
and excavate information such as distribution of characters and locations
relationship between characters and relationship between people and locations.It was concluded that there are 1 787 persons and 1 173 locations in the Biographies of the Shiji
and compared with Benji and Shijia of the Shiji
there are 1 092 unique persons and 556 unique locations of the Biographies of the Shiji.New ideas and frameworks for the construction of digital humanities knowledge base of ancient books were provided.
陈小荷 , 冯敏萱 , 徐润华 , 等 . 先秦文献信息处理 [M ] . 北京 : 世界图书出版公司北京公司 , 2013 .
CHEN X H , FENG M X , XU R H , et al . Pre-Qin literature information processing [M ] . Beijing : World Book Publishing Company Beijing , 2013 .
黄水清 , 王东波 . 古文信息处理研究的现状及趋势 [J ] . 图书情报工作 , 2017 , 61 ( 12 ): 43 - 49 .
HUANG S Q , WANG D B . Review and trend of researches on ancient Chinese character information processing [J ] . Library and Information Service , 2017 , 61 ( 12 ): 43 - 49 .
SCHREIBMAN S , SIEMENS R , UNSWORTH J . A companion to digital humanities [M ] . Oxford : Blackwell , 2004 .
李斌 , 王璐 , 陈小荷 , 等 . 数字人文视域下的古文献文本标注与可视化研究:以《左传》知识库为例 [J ] . 大学图书馆学报 , 2020 , 38 ( 5 ): 72 - 80 , 90 .
LI B , WANG L , CHEN X H , et al . Digital humanity based ancient text annotation and visualization—a case study on Zuozhuan knowledgebase [J ] . Journal of Academic Libraries , 2020 , 38 ( 5 ): 72 - 80 , 90 .
LI B , LI Y X , YANG Q , et al . From history book to digital humanities database:the Basic Annals of the Shiji [J ] . Journal of Chinese History , 2020 , 4 ( 2 ): 528 - 536 .
马创新 , 曲维光 , 陈小荷 . 中文古籍数字化的开发层次和发展趋势 [J ] . 图书馆 , 2014 ( 2 ): 104 - 106 .
MA C X , QU W G , CHEN X H . The exploitation hierarchy and development trend of digitization of Chinese ancient books [J ] . Library , 2014 ( 2 ): 104 - 106 .
李明杰 , 张纤柯 , 陈梦石 . 古籍数字化研究进展述评(2009—2019) [J ] . 图书情报工作 , 2020 , 64 ( 6 ): 130 - 137 .
LI M J , ZHANG X K , CHEN M S . Review on the research progress of the digitization of ancient Chinese books (2009-2019) [J ] . Library and Information Service , 2020 , 64 ( 6 ): 130 - 137 .
毛建军 . 古汉语电子语料库资源与类型概述 [J ] . 辞书研究 , 2011 ( 6 ): 83 - 93 .
MAO J J . Overview of Ancient Chinese electronic corpus resources and types [J ] . Lexicographical Studies , 2011 ( 6 ): 83 - 93 .
王大盈 . 《中国基本古籍库》和《瀚堂典藏》两大古籍数据库比较研究 [J ] . 情报杂志 , 2011 , 30 ( S1 ): 157 - 158 , 161 .
WANG D Y . A comparative study of two databases:Chinese basic ancient books and HYTUNG BOOKS [J ] . Journal of Intelligence , 2011 , 30 ( S1 ): 157 - 158 , 161 .
董志翘 . 为中古汉语研究夯实基础:“中古汉语研究型语料库”建设琐议 [J ] . 燕山大学学报(哲学社会科学版) , 2011 , 12 ( 1 ): 1 - 6 .
DONG Z Q . To lay a solid foundation for the study of medieval Chinese:on the construction of research corpus of medieval Chinese [J ] . Journal of Yanshan University (Philosophy and Social Science Edition) , 2011 , 12 ( 1 ): 1 - 6 .
留金腾 , 宋彦 , 夏飞 . 上古汉语分词及词性标注语料库的构建:以《淮南子》为范例 [J ] . 中文信息学报 , 2013 , 27 ( 6 ): 6 - 15 , 81 .
LAU K T , SONG Y , XIA F . The construction of a segmented and part-ofspeech tagged archaic Chinese corpus:a case study on Huainanzi [J ] . Journal of Chinese Information Processing , 2013 , 27 ( 6 ): 6 - 15 , 81 .
陈力 . 数字人文视域下的古籍数字化与古典知识库建设问题 [J ] . 中国图书馆学报 , 2022 , 48 ( 2 ): 36 - 46 .
CHEN L . Digitalization of ancient books and construction of classical knowledge repository from the perspective of digital humanities [J ] . Journal of Library Science in China , 2022 , 48 ( 2 ): 36 - 46 .
唐振贵 , 罗锦坤 . 中国古代时间本体:细化数字人文研究的时间轴向 [J ] . 图书馆杂志 , 2022 , 41 ( 4 ): 87 - 95 , 37 .
TANG Z G , LUO J K . Ancient Chinese time ontology:refining the time dimension of digital humanities research [J ] . Library Journal , 2022 , 41 ( 4 ): 87 - 95 , 37 .
包弼德 , 王宏苏 , 傅君劢 , 等 . “中国历代人物传记资料库”(CBDB)的历史、方法与未来 [J ] . 数字人文研究 , 2021 , 1 ( 1 ): 21 - 33 .
PETER K B , WANG H S , MICHAEL A F , et al . The history,methods,and future of the China biographical database(CBDB) project [J ] . Digital Humanities Research , 2021 , 1 ( 1 ): 21 - 33 .
钱智勇 , 周建忠 , 贾捷 . 楚辞知识库构建与网站实现研究 [J ] . 图书馆理论与实践 , 2010 ( 10 ): 70 - 73 .
QIAN Z Y , ZHOU J Z , JIA J . Research on knowledge base construction and website implementation of ChuCi [J ] . Library Theory and Practice , 2010 ( 10 ): 70 - 73 .
许超 , 陈小荷 . 《左传》中的春秋社会网络分析 [J ] . 南京师范大学文学院学报 , 2014 ( 1 ): 179 - 184 .
XU C , CHEN X H . Social network analysis of spring and autumn period based on Zuo Zhuan [J ] . Journal of School of Chinese Language and Culture Nanjing Normal University , 2014 ( 1 ): 179 - 184 .
季培培 . 常见10种古籍全文数据库的比较研究 [J ] . 图书馆学研究 , 2020 ( 20 ): 71 - 80 .
JI P P . A comparison study of ten normal ancient book textual databases [J ] . Research on Library Science , 2020 ( 20 ): 71 - 80 .
张琪 , 江川 , 纪有书 , 等 . 面向多领域先秦典籍的分词词性一体化自动标注模型构建 [J ] . 数据分析与知识发现 , 2021 , 5 ( 3 ): 2 - 11 .
ZHANG Q , JIANG C , JI Y S , et al . Unified model for word segmentation and POS tagging of multi-domain pre-Qin literature [J ] . Data Analysis and Knowledge Discovery , 2021 , 5 ( 3 ): 2 - 11 .
刘忠宝 , 党建飞 , 张志剑 . 《史记》历史事件自动抽取与事理图谱构建研究 [J ] . 图书情报工作 , 2020 , 64 ( 11 ): 116 - 124 .
LIU Z B , DANG J F , ZHANG Z J . Research on automatic extraction of historical events and construction of event graph based on historical records [J ] . Library and Information Service , 2020 , 64 ( 11 ): 116 - 124 .
张斌 , 魏扣 , 郝琦 . 国内外知识库研究现状述评与比较 [J ] . 图书情报知识 , 2016 ( 3 ): 15 - 25 .
ZHANG B , WEI K , HAO Q . Review and comparison of research status of knowledge base at home and abroad [J ] . Documentation,Information &Knowledge , 2016 ( 3 ): 15 - 25 .
《史记》修订组 . 史记(点校修订本) [M ] . 北京 : 中华书局 , 2013 .
The Shiji Revision Group . The Shiji (revised version) [M ] . Beijing : Zhonghua Book Company , 2013 .
石民 , 李斌 , 陈小荷 . 基于CRF的先秦汉语分词标注一体化研究 [J ] . 中文信息学报 , 2010 , 24 ( 2 ): 39 - 45 .
SHI M , LI B , CHEN X H . CRF based research on a unified approach to word segmentation and POS tagging for preQin Chinese [J ] . Journal of Chinese Information Processing , 2010 , 24 ( 2 ): 39 - 45 .
许嘉璐 , 安平秋 . 二十四史全译·史记 [M ] . 上海 : 汉语大词典出版社 , 2004 .
XU J L , AN P Q . A complete translation of Twenty-four History Books·The Shiji [M ] . Shanghai : The Publishing House of the Chinese Dictionary , 2004 .
钱穆 . 史记地名考 [M ] . 北京 : 商务印书馆 , 2004 .
QIAN M . An investigation of place names of the Shiji [M ] . Beijing : The Commercial Press , 2004 .
0
浏览量
408
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621