1. 北京大学信息科学技术学院,北京 100871
2. 高可信软件技术教育部重点实验室(北京大学),北京 100871
3. 微软亚洲研究院,北京 100080
[ "邹艳珍(1976- ),女,博士,北京大学信息科学技术学院副教授,主要研究方向为软件工程、软件复用、知识图谱和智能软件开发等" ]
[ "王敏(1994- ),男,北京大学信息科学技术学院博士生,主要研究方向为软件工程、软件复用、代码审查和智能化软件开发等" ]
[ "谢冰(1970- ),男,博士,北京大学教授、信息科学技术学院常务副院长、软件研究所所长,国家杰出青年基金获得者,中国软件行业协会理事,中国计算机学会高级会员,Chinese Journal of Electromics编委,入选教育部新世纪优秀人才支持计划、北京市科技新星计划,获得“中创软件人才奖”。主要研究方向为软件工程、计算机理论科学和分布式系统等" ]
[ "林泽琦(1992- ),男,博士,微软亚洲研究院研究员,主要研究方向为机器学习、智能数据分析、智能开发环境" ]
网络首发:2021-01,
纸质出版:2021-01-15
移动端阅览
邹艳珍, 王敏, 谢冰, 等. 基于大数据的软件项目知识图谱构造及问答方法[J]. 大数据, 2021,7(1):2021002-1.
Yanzhen ZOU, Min WANG, Bing XIE, et al. Software knowledge graph construction and Q&A technology based on big data[J]. Big Data Research, 2021, 7(1): 2021002-1.
邹艳珍, 王敏, 谢冰, 等. 基于大数据的软件项目知识图谱构造及问答方法[J]. 大数据, 2021,7(1):2021002-1. DOI: 10.11959/j.issn.2096-0271.2021002.
Yanzhen ZOU, Min WANG, Bing XIE, et al. Software knowledge graph construction and Q&A technology based on big data[J]. Big Data Research, 2021, 7(1): 2021002-1. DOI: 10.11959/j.issn.2096-0271.2021002.
随着软件规模的不断扩大、软件演化周期的不断延长,构建软件项目知识图谱对软件维护、软件开发的意义越来越重大。如何基于软件项目开发过程中产生的源代码、邮件列表、缺陷报告等多源异构大数据,快速构建语义关联丰富的软件知识图谱,是软件工程领域亟待解决的关键问题。提出了以代码结构为核心的软件知识图谱模型,建立了“知识抽取-知识融合”两层软件知识图谱构造框架,该框架支持软件项目知识图谱的自动构造以及基于知识图谱的软件项目智能问答,有效提高了软件项目理解和软件复用的效率。目前,软件项目知识图谱已经在Apache开源社区以及国内著名软件企业成功展开应用实践。
With the increasing of software scale and software evolution
it is more and more important to construct software project knowledge graph for software maintenance and software development. Automatically constructing software knowledge graph with complex structure and rich semantic relations based on the multi-source heterogeneous mass data such as source code
mailing list
issue report and Q&A document generated in the process of software project development is a key challenge to be solved urgently in the field of software engineering. A code-centric software knowledge model was proposed
a two-layer plugin framework for knowledge graph construction and software Q&A was provided
which improves the efficiency of software understanding and software reuse. At present
software project knowledge graph has successfully deployed in the Apache open source community and in the domestic famous enterprises.
杨芙清 , 梅宏 . 软件复用与软件构件技术 [J ] . 电子学报 , 1999 , 27 ( 2 ): 68 - 75 .
YANG F Q , MEI H . Software reuse and software component technology [J ] . Acta Electronica Sinica , 1999 , 27 ( 2 ): 68 - 75 .
TOMER A , GOLDIN L , KUFLIK T . LEvaluating software reuse alternatives: a model and its application to an industrial case studyearning analytics:ethical issues and dilemmas [J ] . IEEE Transactions on Software Engineering , 2004 , 30 ( 9 ): 610 - 612 .
李文鹏 , 王建彬 , 林泽琦 , 等 . 面向开源软件项目的软件知识图谱构建方法 [J ] . 计算机科学与探索 , 2017 , 11 ( 6 ): 851 - 862 .
LI W P , WANG J B , LIN Z Q , et al . Software knowledge graph building method for open source project [J ] . Journal of Frontiers of Computer Science &Technology , 2017 , 11 ( 6 ): 851 - 862 .
刘峤 , 李杨 , 段宏 , 等 . 知识图谱构建技术综述 [J ] . 计算机研究与发展 , 2016 , 53 ( 3 ): 582 - 600 .
LIU Q , LI Y , DUAN H , et al . Knowledge graph construction techniques [J ] . Journal of Computer Research and Development , 2016 , 53 ( 3 ): 582 - 600 .
ANGELI G , PREMKUMAR M J J , MANNING C D , et al . Leveraging linguistic structure for open domain information extraction [C ] // The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . [S.l.:s.n.] , 2015 : 344 - 354 .
DONG X , GABRILOVICH E , HEITZ G , et al . Knowledge vault: a web-scale approach to probabilistic knowledge fusion [C ] // The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York:ACM Press , 2014 : 601 - 610 .
GARDNER M , MITCHELL T . Inferring concept prerequisite relations from online educational resources [C ] // The 2015 Conference on Empirical Methods in Natural Language Processing . [S.l.:s.n.] , 2015 : 1488 - 1498 .
SOCHER R , CHEN D , MANNING C D , et al . Reasoning with neural tensor networks for knowledge base completion [C ] // Advances in Neural Information Processing Systems . New York: ACM Press , 2013 : 926 - 934 .
RIGBY P C , ROBILLARD M P , et al . Discovering essential code elements in informal documentation [C ] // 2013 35th International Conference on Software Engineering . Piscataway: IEEE Press , 2013 : 832 - 841 .
BACCHELLI A , D’AMBROS M , LANZAETAL M . Benchmarking lightweight techniques to link e-Mails and source code [C ] // 2009 16th Working Conference on Reverse Engineering . Piscataway: IEEE Press , 2009 : 205 - 214 .
WANG M , ZOU Y Z , CAO Y K , et al . Searching software knowledge graph with question [C ] // The 18th International Conference on Software and Systems Reuse . Cham: Springer , 2019 : 115 - 131 .
LIN Z Q , ZOU Y Z , ZHAO J F , et al . Improving software text retrieval using conceptual knowledge in source code [C ] // The 32th IEEE/ACM International Conference on Automated Software Engineering . Piscataway: IEEE Press , 2017 : 123 - 133 .
凌春阳 , 邹艳珍 , 林泽琦 , 等 . 基于图嵌入的软件项目源代码检索方法 [J ] . 软件学报 , 2019 , 30 ( 5 ): 1481 - 1497 .
LING C Y , ZOU Y Z , LIN Z Q , et al . Approach to searching software source code with graph embedding [J ] . Journal of Software , 2019 , 30 ( 5 ): 1481 - 1497 .
FINKEL J R , GRENAGER T , MANNING C . Incorporating non-local information into information extraction systems by Gibbs sampling [C ] // The 43rd Annual Meeting on Association for Computational Linguistics . New York: ACM Press , 2005 : 363 - 370 .
WOODS W A . Progress in natural language understanding: an application to lunar geology [C ] // The National Computer Conference and Exposition . New York:ACM Press , 1973 : 441 - 450 .
BEGEL A , KHOO Y P , ZIMMERMANN T . Codebook: discovering and exploiting relationships in software repositories [C ] // 2010 ACM/IEEE 32nd International Conference on Software Engineering . Piscataway: IEEE Press , 2010 : 125 - 134 .
LIN J , LIU Y , GUO J , et al . TiQi: a natural language interface for querying software project data [C ] // 2017 32nd IEEE/ACM International Conference on Automated Software Engineering . New York: ACM Press , 2017 : 973 - 977 .
DONG L , LAPATA M , et al . Language to logical form with neural attention [C ] // The 54th Annual Meeting of the Association for Computational Linguistics . [S.l.:s.n.] , 2016 : 33 - 43 .
JIA R , LIANG P . Data recombination for neural semantic parsing [C ] // Association for Computational Linguistics . [S.l.:s.n.] , 2016 .
YIN P C , NEUBIG G . A syntactic neural model for general-purpose code generation [J ] . arXiv preprint , 2017, arXiv:1704.01696 .
XU Y , JONES G J F , WANG B . Query dependent pseudo-relevance feedback based on Wikipedia [C ] // The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval . New York: ACM Press , 2009 : 59 - 66 .
BENDERSKY M , METZLER D , CROFT W B . Parameterized concept weighting in verbose queries [C ] // The 34th International ACM SIGIR Conference on Research and development in Information Retrieval . New York: ACM Press , 2011 : 605 - 614 .
DALTON J , DIETZ L , ALLAN J . Entity query feature expansion using knowledge base links [C ] // The 37th International ACM SIGIR Conference on Research &Development in Information Retrieval . New York: ACM Press , 2014 : 365 - 374 .
RPAN D Z , ZHANG P , LI J F , et al . Using dempster-shafer’s evidence theory for query expansion based on freebase knowledge [C ] // Asia Information Retrieval Symposium . Heidelberg: Springer , 2013 : 121 - 132 .
GUISADO-GÁMEZ J , PRAT-PÉREZ A , LARRIBA-PEY J L , et al . Structural query expansion via motifs from Wikipedia [C ] // The Explore DB’17 . New York: ACM Press , 2017 .
0
浏览量
1402
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621