[ "黄宜华,男,博士,南京大学计算机系教授、博士生导师,中国计算机学会大数据专家委员会委员、副秘书长,江苏省计算机学会大数据专家委员会主任,CCF高级会员,南京大学PASA大数据技术实验室学术带头人。主要研究方向为大数据并行处理、大数据机器学习、Web信息抽取与挖掘技术,在国内外学术刊物和国际会议上发表学术论文60多篇,撰写并出版大数据处理方向书籍两部,是近年来出版的《深入理解大数据—大数据处理与编程实践》一书的作者。目前正在大数据并行化算法、大数据系统平台和应用方面开展深入的研究工作,主持或参与多项国家级和省部级科研项目,并在大数据领域开展了与Intel、UC Berkeley AMP Lab、微软亚洲研究院、百度、中兴通讯等国内外知名企业和研究机构的合作研究工作。" ]
网络首发:2015-05,
纸质出版:2015-05-03
移动端阅览
黄宜华. 大数据机器学习系统研究进展[J]. 大数据, 2015,1(1):26-45.
Yihua Huang. Research Progress on Big Data Machine Learning System[J]. BIG DATA RESEARCH, 2015, 1(1): 26-45.
黄宜华. 大数据机器学习系统研究进展[J]. 大数据, 2015,1(1):26-45. DOI: 10.11959/j.issn.2096-0271.2015.01.004.
Yihua Huang. Research Progress on Big Data Machine Learning System[J]. BIG DATA RESEARCH, 2015, 1(1): 26-45. DOI: 10.11959/j.issn.2096-0271.2015.01.004.
要实现高效的大数据机器学习,需要构建一个能同时支持机器学习算法设计和大规模数据处理的一体化大数据机器学习系统。研究设计高效、可扩展且易于使用的大数据机器学习系统面临诸多技术挑战。近年来,大数据浪潮的兴起,推动了大数据机器学习的迅猛发展,使大数据机器学习系统成为大数据领域的一个热点研究问题。介绍了国内外大数据机器学习系统的基本概念、基本研究问题、技术特征、系统分类以及典型系统;在此基础上,进一步介绍了本实验室研究设计的一个跨平台统一大数据机器学习系统——Octopus(大章鱼)。
To achieve efficient big data machine learning
we need to construct a unified big data machine learning system to support both machine learning algorithm design and big data processing. Designing an efficient
scalable and easy-to-use big data machine learning system still faces a number of challenges. Recently
the upsurge of big data technology has promoted rapid development of big data machine learning
making big data machine learning system to become a research hotspot. The basic concepts
research issues
technical characteristics
categories
and typical systems for big data machine learning system
were reviewed. Then a unified and cross-platform big data machine learning system
Octopus
was presented.
Banko M , Brill E , KIEKINTVELD C , . Scaling to very large corpora for natural language disambiguation . Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (ACL) , Toulouse, France , 2001 , 26 ~ 33 .
Brants T , Popat C A , Xu P , et al . Large language models in machine translation . Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Language Learning , Prague, Czech Republic , 2007 , 858 - 867 .
Wang Y , Zhao X M , Sun Z L , et al . Peacock: learning long-tail topic features for industrial applications . ACM Transactions on Intelligent Systems and Technology , 2014 , 9 ( 4 )
中国计算机学会大数据专家委员会 . 2015年中国大数据发展趋势预测 . 中国计算机学会通讯 , 2014 , 11 ( 1 ): 48 - 52 .
CCF Task Force on Big Data . Forecast for the development trend of big data in 2015 Communications of the China Computer Federation (CCCF) , 2014 , 11 ( 1 ): 48 - 52 .
Gonzalez J E Emerging systems for large-scale machine learning . Proceedings of Tutorial on International Conference for Machine Learning(ICML) 2014 , Beijing, China , 2014
中国计算机学会大数据专家委员会 . 2014年中国大数据技术与产业发展白皮书 . 2014中国大数据技术大会 , 北京, 中国 , 2014
CCF Task Force on Big Data . White paper of China’s big data technology and industrial development in 2014 Proceedings of Big Data Conference China , Beijing, China , 2014
Boehm M , Tatikonda S , Reinwald B . et al . Hybrid parallelization strategies for large-scale machine learning in systemML . Proceedings of the VLDB Endowment , Hangzhou, China 2014
Markl V , YIN Z . Breaking the chains: on declarative data analysis and data independence in the big data era . Proceedings of the VLDB Endowment , Hangzhou, China 2014
Kraska T T . MLbase: a distributed machine-learning system . Proceedings of the 6th Conference on Innovative Data Systems Research(CIDR) , Asilomar, CA, USA , 2013
Fan W F , Geerts F , Neven F , . Making queries tractable on big data with preprocessing: through the eyes of complexity theory . Proceedings of the VLDB Endowment , Trento, Italy , 2011 : 685 ~ 696 .
Dean J , Ghemawat S . MapReduce:simplified data processing on large clusters . Communications of the ACM , 2004 , 51 ( 1 ): 107 ~ 113 .
Zaharia M , Chowdhury M , Das T , et al . Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing . Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI) , San Jose, CA, USA , 2012 141 - 146 .
Venkataraman S , Bodzsar E , Roy I , et al . Presto: distributed machine learning and graph processing with sparse matrices . Proceedings of the 8th ACM European Conference on Computer Systems(EuroSys) , Prague, Czech Republic , 2013 197 - 210 .
Ghoting A , Krishnamurthy S , Pednault E , et al . SystemML: declarative machine learning on MapReduce . Proceedings of International Conference on Data Engineering (ICDE) , Hannover, Germany , 2011 231 ~ 242 .
Boehm M , Tatikonda S , Reinwald B , et al . Hybrid parallelization strategies for large-scale machine learning in SystemML . Proceedings of the VLDB Endowment , Hangzhou, China , 2011 231 - 242 .
Low Y , Bickson D , Gonzalez J , et al . Distributed graphLab: a framework for machine learning and data mining in the cloud . Proceedings of the VLDB Endowment , Istanbul, Turkey , 2012 716 ~ 727 .
Li M , Andersen G D , Park W J , et al . Scaling distributed machine learning with the parameter server . Proceedings of Operating Systems Design and Implementation (OSDI) , Broomfield, CD, USA , 2014 ; 583 ~ 598 .
Ho Q , Cipar J , Cui H W J , et al . More effective distributed ml via a stale synchronous parallel parameter server . Proceedings of Advances in Neural Information Processing Systems (NIPS) , Nevada, USA , 2013 : 1223 ~ 1231 .
Alexandrov A , Bergmann R , Ewen S , et al . The stratosphere platform for big data analytics . Vldb Journal , 2014 , 23 ( 6 ): 939 - 964 .
Battré D , Ewen S , Hueske F , et al . Nephele/PACTs: a programming model and execution framework for web-scale analytical processing . Proceedings of ACM Symposium on Cloud Computing(SoCC) , Indianapolis, Indiana, USA , 2010 : 119 ~ 130 .
Dai W , Wei J , Zheng X , et al . Petuum:a framework for iterative-convergent distributed ML . Proceedings of Advances in Neural Information Processing Systems 26, Big Learning Workshop , California, USA , 2013
邹永强 . Mariana—腾讯深度学习平台的进展与应用 . 2014年中国大数据技术大会 , 北京,中国 , 2014
Zou Y Q . Marina-the progress and application of deep learning platform of Tencent . Proceedings of Database Technology Conference China 2015 , Beijing, China , 2014
刘伟 . 百度机器学习云平台 . 2014年中国大数据技术大会 , 北京,中国 , 2015
Liu W . Machine learning cloud platform of Baidu Proceedings of Database Technology Conference China 2015 , Beijing, China , 2015
0
浏览量
1661
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621