[ "汤小春(1969- ),男,博士,西北工业大学计算机学院副教授,主要研究方向为大数据计算、大图数据挖掘、集群资源管理等" ]
[ "符莹(1996- ),女,西北工业大学计算机学院硕士生,主要研究方向为大数据计算、集群资源管理等" ]
[ "丁朝(1995- ),男,西北工业大学计算机学院硕士生,主要研究方向为大数据计算、集群资源管理等" ]
[ "毛安琪(1996- ),女,西北工业大学计算机学院硕士生,主要研究方向为大数据计算、集群资源管理等" ]
[ "李战怀(1961- ),男,博士,西北工业大学计算机学院教授,大数据存储与管理工业和信息化部重点实验室主任,主要研究方向为数据库理论与技术、数据流、数据密集型计算、内存计算、数据挖掘等" ]
网络首发:2020-05,
纸质出版:2020-05-15
移动端阅览
汤小春, 符莹, 丁朝, 等. 数据流计算环境下的集群资源管理技术[J]. 大数据, 2020,6(3):2020026-1.
Xiaochun TANG, Ying FU, Zhao DING, et al. State-of-art research of cluster resource management in dataflow computing model[J]. Big Data Research, 2020, 6(3): 2020026-1.
汤小春, 符莹, 丁朝, 等. 数据流计算环境下的集群资源管理技术[J]. 大数据, 2020,6(3):2020026-1. DOI: 10.11959/j.issn.2096-0271.2020026.
Xiaochun TANG, Ying FU, Zhao DING, et al. State-of-art research of cluster resource management in dataflow computing model[J]. Big Data Research, 2020, 6(3): 2020026-1. DOI: 10.11959/j.issn.2096-0271.2020026.
以集群为基础的高性能计算的发展经历了3个阶段的演化,即计算子系统与存储子系统的分离、计算子系统与存储子系统的融合以及以数据并行为基础的dataflow编程模型。随着Spark、Flink等数据流编程模型在大数据计算领域的广泛使用,计算作业类型千变万化,如何保证各种数据流计算作业对集群资源的共享使用是集群资源管理的核心,也是降低基础设施成本的主要手段。分析集群资源管理的历史变化,从数据流编程模型的角度出发,对HoD、集中式、双层调度、分布式以及混合式管理展开了深入的探索,介绍了其各自的优缺点以及应用现状,为数据流计算环境下的集群资源管理和调度的使用或者研发提供一定的参考和借鉴。
The development of cluster-based high-performance computing has undergone three stages of evolution.With the widespread use of dataflow programming models such as Spark and Flink in the field of big data computing
how to ensure the fair share with the cluster resources by various dataflow computing applications is extremely important.It is also a main means to reduce the cost of infrastructures.As the drawbacks of traditional cluster resource management have becoming increasingly apparent in dataflow computing model
many alternative cluster resource management
including HoD
centralized scheduling
two-level scheduling
distributed scheduling
and hybrid scheduling management
have been proposed in recent years.Their respective advantages and disadvantages were introduced
and a certain reference for the uses or researches in development of cluster resource management and scheduling in a dataflow computing environment was provided.
HOVESTADT M , KAO O , KELLER A , et al . Scheduling in HPC resource management systems:queuing vs planning [J ] . Genetica , 2003 : 112-113 ( 1 ): 445 - 461 .
MISHRA M K , PATEL Y S , ROUT Y , et al . A survey on scheduling heuristics in grid computing environment [J ] . International Journal of Modern Education and Computer Science , 2014 , 6 ( 10 ): 57 - 77 .
杜小勇 , 陈跃国 , 范举 , 等 . 数据整理——大数据治理的关键技术 [J ] . 大数据 , 2019 , 5 ( 3 ): 13 - 22 .
DU X Y , CHEN Y G , FAN J , et al . Data wrangling:a key technique of data governance [J ] . Big Data Research , 2019 , 5 ( 3 ): 13 - 22 .
陈康 , 郑纬民 . 云计算:系统实例与研究现状 [J ] . 软件学报 , 2009 , 20 ( 5 ): 1337 - 1348 .
CHEN K , ZHENG W M . Cloud computing:system instances and current research [J ] . Journal of Software , 2009 , 20 ( 5 ): 1337 - 1348 .
KARANASOS K , RAO S , CURINO C , et al . Mercury:hybrid centralized and distributed scheduling in large shared clusters [C ] // 2015 USENIX Annual Technical Conference . Berkeley:USENIX Association , 2015 : 485 - 497 .
DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters [J ] . Communications of the ACM , 2008 , 51 ( 1 ): 107 - 113 .
PARK J J K , PARK Y , MAHLKE S . Dynamic resource management for efficient utilization of multitasking GPUs [C ] // The 22nd International Conference on Architectural Support for Programming Languages and Operating Systems . New York:ACM Press , 2017 : 527 - 540 .
ZAHARIA M , CHOWDHURY M , DAS T , et al . Resilient distributed datasets:a fault-tolerant abstraction for inmemory cluster computing [C ] // The 9th USENIX Networked Systems Design and Implementation . Berkeley:USENIX Association , 2012 : 2 - 14 .
ARMBRUST M , XIN R S , LIAN C , et al . Spark SQL:relational data processing in Spark [C ] // The 2015 ACM SIGMOD International Conference on Management of Data . New York:ACM Press , 2015 : 1383 - 1394 .
CARBONE P , KATSIFODIMOS A , EWEN S , et al . Apache Flink:stream and batch processing in a single engine [J ] . IEEE Data Engineering Bulletin , 2015 , 38 ( 4 ): 28 - 38 .
FUKUTOMI D , IIDA Y , AZUMI T , et al . GPUhd:augmenting YARN with GPU resource management [C ] // International Conference on High Performance Computing in Asia-Pacific Region . New York:ACM Press , 2018 : 127 - 136 .
VERMA A , PEDROSA L , KORUPOLU M . et al Large-scale cluster management at Google with Borg [C ] // The 10th European Conference on Computer Systems . New York:ACM Press , 2015 : 1 - 17 .
HINDMAN B , KONWINSKI A , ZAHARIA M , et al . Mesos:a platform for finegrained resource sharing in the data center [C ] // The 8th USENIX Conference on Networked Systems Design and Implementation . Berkeley:USENIX Association , 2011 : 295 - 308 .
BOUTIN E , EKANAYAKE J , LIN W , et al . Apollo:scalable and coordinated scheduling for cloud-scale computing [C ] // The 11th USENIX Conference on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2014 : 285 - 300 .
KONSTANTINOS K , SRIRAM R , CARLO C , et al . Mercury:hybrid centralized and distributed scheduling in large shared clusters [C ] // 2015 USENIX Annual Technical Conference . Berkeley:USENIX Association , 2015 : 485 - 497 .
AKIDAU T , BRADSHAW R , CHAMBERS C , et al . The dataflow model:a practical approach to balancing correctness,latency,and cost in massive-scale,unbounded,out-of-order data processing [J ] . Proceedings of the VLDB Endowment , 2015 , 8 ( 12 ): 1792 - 1803 .
0
浏览量
708
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621