[ "郑纬民,男,清华大学教授、博士生导师,中国计算机学会理事长,目前主要从事并行与分布式计算、存储系统的研究工作,主持和参与多项国家“973”计划、“863”计划、国家自然科学基金项目。近年来在IEEE TC/IEEE TPDS/ACM TOS/FAST等本领域顶级期刊与国际会议发表论文40余篇。" ]
网络首发:2015-05,
纸质出版:2015-05-20
移动端阅览
郑纬民. 从系统角度审视大数据计算[J]. 大数据, 2015,1(1):10-19.
Weiming Zheng. Reviewing Big Data Computation from a System Perspective[J]. BIG DATA RESEARCH, 2015, 1(1): 10-19.
郑纬民. 从系统角度审视大数据计算[J]. 大数据, 2015,1(1):10-19. DOI: 10.11959/j.issn.2096-0271.2015.01.002.
Weiming Zheng. Reviewing Big Data Computation from a System Perspective[J]. BIG DATA RESEARCH, 2015, 1(1): 10-19. DOI: 10.11959/j.issn.2096-0271.2015.01.002.
大数据计算是实现大数据“巨大价值”的必要手段,而计算系统是大数据计算的有效载体。试着从系统角度审视大数据计算,透过大数据的体量巨大、速度极快、模态多样、真伪难辨等宏观特征,针对批量计算、流式计算、大图计算等计算形式,分别探讨大数据计算的典型特征,论述了这些特征给大数据计算系统的设计与实现带来的技术挑战,进而梳理了为了应对这些挑战所取得的研究成果,最后从系统角度指出未来大数据计算可能的一些研究方向。
Big data computing is a necessary way to acquire the “great value” behind the big data
and a computing system is an effective tool for big data computing. Big data computing from a system perspective was reviewed. Based on the fact that big data has the macro characteristics of huge volume
growing fast
complex structure
and quality disparity
the typical features of big data computing by analyzing batch computing
stream computing
and graph computing respectively
were discussed. These features may bring technical challenges to the design and implementation of big data computing system. The related works for overcoming these challenges were further categoried. In the end
some prospective research directions of big data computing from the system perspective were listed.
Chen C L , Zhang C Y . Data-intensive applications, challenges, techniques and technologies: a survey on big data . Information Sciences , 2014 ( 275 ): 314 - 347 .
Chang R M , Kauffman R J , Kwon Y . Understanding the paradigm shift to computational social science in the presence of big data . Decision Support Systems , 2014 ( 63 ): 67 - 80 .
Kambatla K , Kollias G , Kumar V et al . Trends in big data analytics . Journal of Parallel and Distributed Computing , 2014 ( 74 ): 2561 - 2573 .
李国杰 , 程学旗 . 大数据研究: 未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考 . 中国科学院院刊 , 2012 27 ( 6 ): 647 ~ 657
Li G J , Cheng X Q . Big data research: the major strategic areas of technology and economic development——research status and scientific thinking of big data . Bulletin of the Chinese Academy of Sciences , 2012 , 27 ( 6 ): 647 ~ 657 .
孙大为 , 张广艳 , 郑纬民 . 大数据流式计算:关键技术及系统实例 . 软件学报 , 2014 25 ( 4 ): 839 ~ 862
Sun D W , Zhang G Y , Zheng W M . Big data stream computing: technologies and instances Journal of Software , 2014 , 25 ( 4 ): 839 ~ 862
程学旗 , 靳小龙 , 王元卓 等 . 大数据系统和分析技术综述 . 软件学报 , 2014 25 ( 9 ): 1889 ~ 1908
Cheng X Q , Jin X L , Wang Y Z et al . Survey on big data system and analytic technology . Journal of Software , 2014 25 ( 9 ): 1889 ~ 1908
王元卓 , 靳小龙 , 程学旗 . 网络大数据:现状与展望 . 计算机学报 , 2013 36 ( 6 ): 1125 ~ 1138 .
Wang Y Z , Jin X L , Cheng X Q . Network big data: present and future . Chinese Journal of Computers , 2013 36 ( 6 ): 1125 - 1138
李学龙 , 龚海刚 . 大数据系统综述 . 中国科学:信息科学 2015 45 ( 1 ): 1 ~ 44
Li X L , Gong H G . Survey on big data system . Scientia Sinica Informationis , 2015 45 ( 1 ): 1 ~ 44
Dobre C , Xhafa F . Intelligent services for big data science . Future Generation Computer Systems , 2014 ( 37 ): 267 ~ 281 .
Aisling O D , Jurate D , Roy D S . Hadoop and cloud computing in genomics . Journal of Biomedical Informatics , 2013 ( 46 ): 774 ~ 781 .
Hadoop http://hadoop.apache.org/,2005 http://hadoop.apache.org/,2005 .
Zaharia M , Das T , Li H et al . Discretized streams: fault-tolerant streaming computation at scale . Proceedings of the SOSP 2013, Pennsylvania, USA , 2013
Spark . http://spark-project.org,2013 http://spark-project.org,2013 .
Cugola G , Margara A , Li H . Processing flows of information: from data stream to complex event processing . ACM Computing Surveys , 2011 , 44 ( 3 ): 51 ~ 62 .
Zhang Z , Gu Y , Ye F . et al . A hybrid approach to high availability in stream processing systems . Proceedings of the 30th IEEE International Conference on Distributed Computing Systems , Genova, Italy , 2010 , 135 - 148 .
Liu X F , Lftikhar N , Xie X . Survey of real-time processing systems for big data . Proceedings of the 30th IEEE International Conference on Distributed Computing Systems , Genova, Italy, Jun 2014 , 356 - 361 .
Storm . http://storm-project.net/,2015 http://storm-project.net/,2015
Chauhan J , Chowdhury S A , Makaroff D . Performance evaluation of Yahoo! S4: a first look . Proceedings of 7th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing , Victoria, BC, Canada 2012 , 58 - 65 .
Chatziantoniou D , Pramatari K , Sotiropoulos Y . Supporting real-time supply chain decisions based on RFID data streams . Journal of Systems and Software , 2011 , 84 ( 4 ): 700 - 710 .
GraphLab . http://graphlab.org/projects/index.html,2015 http://graphlab.org/projects/index.html,2015 .
Furedi Z , Kostochka A , Kumbhat M . Choosability with separation of complete multipartite graphs and hypergraphs . Journal of Graph Theory , 2014 , 76 ( 2 ): 129 - 137 .
0
浏览量
562
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621