1. 星环信息科技(上海)有限公司,上海 200233
2. 南京大学软件学院,江苏 南京 210093
[ "刘汪根(1985- ),男,星环信息科技(上海)有限公司研发总监、总架构师,主要研究方向为新一代的大数据架构、分布式数据库技术和容器云等" ]
[ "郑淮城(1987- ),男,星环信息科技(上海)有限公司软件工程师,星环原生云操作系统研发负责人,主要研究方向为复杂业务场景的底层容器云技术工程化" ]
[ "荣国平(1977- ),男,博士,南京大学软件学院副研究员,主要研究方向为DevOps、微服务架构、虚拟化技术等" ]
网络首发:2020-01,
纸质出版:2020-01-15
移动端阅览
刘汪根, 郑淮城, 荣国平. 云环境下大规模分布式计算数据感知的调度系统[J]. 大数据, 2020,6(1):2020007-1.
Wanggen LIU, Huaicheng ZHENG, Guoping RONG. A scheduler system for large-scale distributed data computing in cloud[J]. Big Data Research, 2020, 6(1): 2020007-1.
刘汪根, 郑淮城, 荣国平. 云环境下大规模分布式计算数据感知的调度系统[J]. 大数据, 2020,6(1):2020007-1. DOI: 10.11959/j.issn.2096-0271.2020007.
Wanggen LIU, Huaicheng ZHENG, Guoping RONG. A scheduler system for large-scale distributed data computing in cloud[J]. Big Data Research, 2020, 6(1): 2020007-1. DOI: 10.11959/j.issn.2096-0271.2020007.
介绍了新的调度系统,包括资源调度、应用编排、配置标签中心、云网络和云存储服务等子系统。系统通过数据拓扑感知能力保证了计算和数据的局部性,节约网络I/O开销;通过优化点对点大数据量读取的资源调度,解决网络风暴造成的影响;通过网络和磁盘隔离技术以及可抢占的方式来保证服务等级协议。
A novel scheduler system including resource scheduling
application scheduling
configuration and label management center
cloud network and cloud storage services was introduced.The locality of computation and data was ensured by the ability of data topology awareness
and the I/O cost was saved.The impact of network storm was solved by optimizing the resource scheduling of point to point large data reading.The service level protocol was guaranteed by network and disk isolation technology and preemptive way.
周涛 , 潘柱廷 , 程学旗 . CCF大专委2019年大数据发展趋势预测 [J ] . 大数据 , 2019 , 5 ( 1 ): 109 - 115 .
ZHOU T , PAN Z T , CHENG X Q . Developing tendency prediction of big data in 2019 from CCF TFBD [J ] . Big Data Research , 2019 , 5 ( 1 ): 109 - 115 .
BURNS B , GRANT B , OPPENHEIMER D , et al . Borg,omega,and Kubernetes [J ] . Communications of the ACM , 2016 , 59 ( 5 ): 50 - 57 .
ZAHARIA M , CHOWDHURY M , DAS T , et al . Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing [C ] // The 9th Usenix Conference on Networked Systems Design and Implementation,April 25-27,2012,San Jose,USA . Berkeley:USENIX Association , 2012 .
BREWER E A , . Kubernetes and the path to cloud native [C ] // The 6th ACM Symposium on Cloud Computing,August 27-29,2015,Kohala Coast,USA . New York:ACM Press , 2015 :167.
VERMA A , PEDROSA L , KORUPOLU M , et al . Large-scale cluster management at Google with Borg [C ] // The 10th European Conference on Computer Systems,April 21-24,2015,Bordeaux,France . New York:ACM Press , 2015 .
HINDMAN B , KONWINSKI A , ZAHARIA M , et al . Mesos:a platform for finegrained resource sharing in the data center [C ] // The 8th USENIX Conference on Networked Systems Design and Implementation,March 30-April 1,2011,Boston,USA . Berkeley:USENIX Association , 2013 : 295 - 308 .
SCHWARZKOPF M , KONWINSKI A , ABDEL-MALEK M . Omega:flexible,scalable schedulers for large compute clusters [C ] // The 8th ACM European Conference on Computer Systems,April 15 - 17,2013,Prague,Czech Republic . New York:ACM Press , 2013 : 351 - 364 .
GHEMAWAT S , GOBIOFF H , LEUNG S T . The Google file system [J ] . ACM SIGOPS Operating Systems Principles , 2003 , 37 ( 5 ): 29 - 43 .
CHANG F , DEAN J , GHEMAWAT S , et al . Bigtable:a distributed storage system for structured data [J ] . ACM Transactions on Computer System , 2008 , 26 ( 2 ): 1 - 26 .
DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters [C ] // The 6th Conference on Symposium on Operating Systems Design& Implementation,December 6-8,2004,San Francisco,USA . Berkeley:USENIX Association , 2004 :10.
LESLIE L . The part-time parliament [J ] . ACM Transactions on Computer Systems , 1998 : 133 - 169 .
YANG F , TSCHETTER E , MERLINO G , et al . Druid:a real-time analytical data store [C ] // The 2014 ACM SIGMOD International Conference on Management of Data,June 2227,2014,Snowbird,USA . New York:ACM Press , 2014 : 157 - 168 .
PARIS C , STEPHAN E , SEIF H . Apache Flink:stream and batch processing in a single engine [J ] . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , 2015 , 36 ( 4 ): 28 - 38 .
ZAHARIA M , DAS T , LI H Y , et al . Discretized streams:fault-tolerant streaming computation at scale [C ] // The 24th ACM SOSP Symposium on Operating Systems Principles,November 3-6,2013,Farmington,USA . New York:ACM Press , 2013 : 423 - 428 .
MARTIN A , PAUL B , CHEN J M , et al . TensorFlow:a system for large-scale machine learning [C ] // The 12th USENIX Symposium on Operating Systems Design and Implementation,November 2-4,2016,Savannah,USA . Berkeley:USENIX Association , 2016 : 265 - 283 .
GANDINI A , GRIBAUDO M , KNOTTENBELT W J , et al . Performance evaluation of NoSQL databases [M ] . Heidelberg : SpringerPress , 2014 .
NAMBIAR R O , POESS M . The making of TPC-DS [C ] // The 32nd International Conference on Very Large Data Bases,September 12-15,2006,Seoul,Korea.[S.l.:s.n . ] , 2006 : 1049 - 1058 .
ZAHARIA M , BORTHAKUR D , SARMA J S , et al . Delay scheduling:a simple technique for achieving locality and fairness in cluster scheduling [C ] // The 5th European Conference on Computer Systems,April 13-16,2010,Paris,France . New York:ACM Press , 2010 : 265 - 278 .
0
浏览量
1027
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621