中国人民大学信息学院大数据管理与分析方法研究北京市重点实验室 北京 100872
[ "窦志成,男,中国人民大学信息学院研究员、硕士生导师,中国计算机学会大数据专家委员会通讯委员,中文信息学会信息检索专委会通讯委员,中国中文信息学会青年工作委员会委员,亚洲信息检索协会Steering Committee成员,美国ACM学会、IEEE会员,中国计算机学会会员。主要研究方向为信息检索、互联网搜索、数据挖掘、大数据等。近年来,在国际知名会议和学术期刊上(如SIGIR、WWW、CIKM、WSDM、EMNLP及IEEE TKDE等)发表论文20余篇。" ]
[ "文继荣,男,博士,中国人民大学信息学院教授、博士生导师,国家“千人计划”特聘专家。1999年至2013年就职于微软亚洲研究院,自2008年起担任高级研究员和互联网搜索与数据挖掘组主任。在微软亚洲研究院工作的14年中,获得50多项美国专利,其中一些成果已经被用于重要的微软产品中(如微软搜索引擎Bing)。所领导的研究团队开发出了微软学术搜索(http://academic.research.microsoft.com)、人立方(http://renlifang.msra.cn/)、产品搜索等有影响力的互联网应用。在国际著名会议和期刊上发表了100多篇论文,担任过许多国际会议和研讨会的程序委员和主席。目前是信息检索领域主要期刊ACM Transactions on Information Systems (TOIS)的副主编。" ]
网络首发:2015-06,
纸质出版:2015-06-20
移动端阅览
窦志成, 文继荣. 大数据时代的互联网分析引擎[J]. 大数据, 2015,1(3):29-40.
Zhicheng Dou, Jirong Wen. Web Analytical Engine in the Big Data Era[J]. BIG DATA RESEARCH, 2015, 1(3): 29-40.
窦志成, 文继荣. 大数据时代的互联网分析引擎[J]. 大数据, 2015,1(3):29-40. DOI: 10.11959/j.issn.2096-0271.2015027.
Zhicheng Dou, Jirong Wen. Web Analytical Engine in the Big Data Era[J]. BIG DATA RESEARCH, 2015, 1(3): 29-40. DOI: 10.11959/j.issn.2096-0271.2015027.
随着互联网尤其是移动互联网的高速发展,互联网文档的数量、内容的丰富度和复杂度都大大增加,互联网正朝大数据时代迈进,而用户的信息需求也趋于复杂化。除了基本的信息检索需求外,对大量相关文档的深入理解与聚合分析的需求也越来越强烈,而传统的互联网搜索引擎已经无法满足人们对该类信息的需求。针对这一问题,提出“互联网分析引擎”的构想,阐述了其与搜索引擎和OLAP分析系统的区别,介绍了一种互联网分析引擎的架构,并详细讨论了实现该引擎的核心问题。
Web search engines can only return a list of Web documents (the so-called ten blue links)
whereas users may need high-order knowledge that is contained within the Web documents.The demand of analytical services atop the Web is becoming stronger with the rapid development of the internet and the increase of big Web data.The concept of“Web Analytical Engine”
which aims to provide analytical service atop the huge amount of Web documents
was introduced.A simple infrastructure was described and the key research problems for building such an engine were discussed.
. 中国互联网络信息中心 . 第36次中国互联网络发展状况统计报告 . http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201507/P020150723549500667087.pdf http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201507/P020150723549500667087.pdf , 2006
China Internet Network Information Center . The 36th China Internet Development Report . http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201507/P020150723549500667087.pdf http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201507/P020150723549500667087.pdf , 2006
Sergey B , Lawrence P . The anatomy of a large-scale hypertextual Web search engine . Computer Networks , 1998 ( 30 ): 107 ~ 117
Codd E F , Codd S B , Salley C T . Providing OLAP (Online Analytical Processing) to User-Analysts: An IT Mandate . E F Codd&Associates , 1998
Thomsen E . OLAP Solutions: Building Multidimensional Information Systems (2nd Edition) . Hoboken: John Wiley &Sons , 2002
Zhu M J , Shi S M , Li M J , et al . Effective top-k computation with term-proximity support . Information Processing and Management , 2009 ( 45 ): 401 ~ 412
Gray J , Bosworth A , Layman C , et al . Data cube: a relational aggregation operator generalizing group-by,cross-tab,and sub-totals . Proceedings of IEEE Computer Society the 12th International Conference on Data Engineering , Washington DC,USA , 1996 : 152 ~ 159
Han J , Wang J , Dong G , et al . Cube explorer: online exploration of data cubes . Proceedings of the 2002 ACM SIGMOD International Conference on Management of data , Madison,Wisconsin,USA , 2002 : 626 ~ 626
Harinarayan V , Rajaraman A , Ullman J D . Implementing data cubes efficiently . Proceedings of ACM SIGMOD Conference , Montreal,Canada , 1996 : 205 ~ 216
Etcheverry L , Vaisman A A . Enhancing OLAP analysis with web cubes . Proceedings of the 9th Extended Semantic Web Conference , Heraklion,Crete,Greece , 2012 : 469 ~ 483
Colazzo D , Goasdou F , Manolescu I , et al . RDF analytics: lenses over semantic graphs . Proceedings of the 23rd International Conference on World Wide Web , New York,USA , 2014 : 467 ~ 478
Wu L L , Sumbaly R , Riccomini C , et al . Avatara: OLAP for web-scale analytics products.Proceedings of the VLDB Endowmen . Proceedings of the 23rd International Conference on World Wide Web , Istanbul,Turkey , 2012 : 1874 ~ 1877
Salton G , Wong A , Yang C S . A vector space model for automatic indexing . Communications of the ACM , 1974 ( 18 ): 613 ~ 620
Croft B , Lafferty J . Language Modeling for Information Retrieval . Norwell:Kluwer Academic Publishers , 2003
Lafferty J , Zhai C X . Probabilistic relevance models based on document and query generation . Language Modeling for Information Retrieval , 2003
Zhai C X , Lafferty J . A study of smoothing methods for language models applied to ad hoc information retrieval . Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieva , New Orleans,Louisiana,USA , 2001 : 334 ~ 342
Tao T , Wang X H , Mei Q Z , et al . Language model information retrieval with document expansion . Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL’06) , Stroudsburg,PA,USA , 2006 : 407 ~ 414
Srikanth M , Srihari R . Exploiting syntactic structure of queries in a language modeling approach to IR . Proceedings of the 12th International Conference on Information and Knowledge Management , New York,NY,USA , 2003 : 476 ~ 483
Bai J , Nie J Y , Cao G . Using query contexts in information retrieval . Proceedings of the 30th Annual International ACM SIGIR Conference , Amsterdam,Holland , 2007 : 15 ~ 22
Turtle H , Croft W B , . Evaluation of an inference network-based retrieval model . ACM Transactions on Information Systems 1991 ( 9 ): 187 ~ 222
Li Z W , Wang B , Li M J , et al . A probabilistic model for retrospective news event detection . Proceedings of the 28th Annual International ACM SIGIR Conference , Salvador,Brazil , 2005 : 106 ~ 113
0
浏览量
491
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621