1. 软件开发环境国家重点实验室,北京 100191
2. 北京航空航天大学计算机学院,北京 100191
3. 北京联合大学智慧城市学院,北京 100101
4. 国家电网有限公司大数据中心,北京 100031
[ "张晨浩(1997- ),男,北京航空航天大学计算机学院博士生,主要研究方向为高性能计算、分布式存储等" ]
[ "肖利民(1970- ),男,博士,北京航空航天大学计算机学院教授、博士生导师,计算机科学技术系主任,计算机系统结构研究所副所长,中国计算机学会(CCF)大数据专家委员会委员、高性能计算专业委员会常务委员、容错计算专业委员会委员,中国电子学会云计算专家委员会委员,主要研究方向为计算机体系结构、大数据存储、高性能计算等。曾获国家科技进步奖二等奖4项、省部级科技进步奖一等奖4项及其他省部级奖项5项。发表SCI/EI论文230多篇,申请发明专利100多项,其中授权发明专利88项" ]
[ "秦广军(1977- ),男,博士,北京联合大学智慧城市学院讲师,CCF会员,主要研究方向为高性能计算、存储系统、大数据和机器学习等。作为项目骨干参与多项国家863计划项目、国家重点研发计划项目、国家自然科学基金面上项目、北京市自然科学基金面上项目等" ]
[ "宋尧(1994- ),男,北京航空航天大学计算机学院博士生,主要研究方向为高性能计算、分布式存储、分布式调度系统、存算联动调度等" ]
[ "蒋世轩(1999- ),男,北京航空航天大学计算机学院硕士生,主要研究方向为分布式存储、存算联动调度等" ]
[ "王继业(1964- ),男,博士,国家电网有限公司大数据中心教授级高级工程师,主要从事电力信息化、能源互联网、大数据与人工智能等方面的研究工作" ]
网络首发:2021-09,
纸质出版:2021-09-15
移动端阅览
张晨浩, 肖利民, 秦广军, 等. 面向大数据处理应用的广域存算协同调度系统[J]. 大数据, 2021,7(5):2021050.
Chenhao ZHANG, Limin XIAO, Guangjun QIN, et al. A wide-area collaborative scheduling system oriented to big data processing applications[J]. Big data research, 2021, 7(5): 2021050.
张晨浩, 肖利民, 秦广军, 等. 面向大数据处理应用的广域存算协同调度系统[J]. 大数据, 2021,7(5):2021050. DOI: 10.11959/j.issn.2096-0271.2021050.
Chenhao ZHANG, Limin XIAO, Guangjun QIN, et al. A wide-area collaborative scheduling system oriented to big data processing applications[J]. Big data research, 2021, 7(5): 2021050. DOI: 10.11959/j.issn.2096-0271.2021050.
以我国研发的高性能计算虚拟数据空间系统为基础,针对大数据处理应用如何统筹利用广域存储和计算资源的问题,设计并实现了一套面向大数据处理应用的广域存算协同调度系统。该系统可依据应用的计算特征和数据布局,通过存算协同、负载均衡、数据局部性感知等策略,在广域环境中协同调度应用数据和计算任务,统筹利用广域计算和存储资源,有效提升大数据处理应用的运行性能。在国家高性能计算环境中实际测试的结果表明,提出的调度方法可有效地支撑大数据处理应用,跨域目标协同识别、分子对接等典型应用的运行效率可提升3~4倍。
Based on the high-performance computing global virtual data space system
a wide-area collaborative scheduling system for big data processing applications was designed and implemented.This system can address the issue of how big data processing applications unified use wide-area storage and computing resources.And it can collaborative schedule of application data and computing tasks based on the computing characteristics of the application and data layout through collaborative scheduling
load balancing scheduling
data locality scheduling strategies.By unified scheduling of application data and computing tasks in the wide-area environment
it can coordinate the utilization of wide-area computing and storage resources
and effectively improve the running performance of big data processing applications.The actual test results in the national high-performance computing environment show that the scheduling method proposed can support big data processing applications effectively
and the running efficiency of typical applications such as wide-area target collaborative recognition and molecular docking can be increased by 3~4 times.
佩瑟鲁·拉吉 阿诺帕马·拉曼 德维亚·纳加拉杰 , 等 . 高性能计算系统与大数据分析 [M ] . 齐宁,庞建民,张铮,等,译.北京 : 机械工业出版社 , 2019 : 17 - 20 .
RAJ P , RAMAN A , NAGARAJ D , et al . High-performance big-data analytics computing systems and approaches [M ] . Translated by QI N,PANG J M,ZHANG Z,et al . Beijing : China Machine Press , 2019 : 17 - 20 .
彭宇 , 庞景月 , 刘大同 , 等 . 大数据:内涵、技术体系与展望 [J ] . 电子测量与仪器学报 , 2015 , 29 ( 4 ): 469 - 482 .
PENG Y , PANG J Y , LIU D T , et al . Big data:connotation,technical framework and its development [J ] . Journal of Electronic Measurement and Instrumentation , 2015 , 29 ( 4 ): 469 - 482 .
陈国良 , 毛睿 , 蔡晔 . 高性能计算及其相关新兴技术 [J ] . 深圳大学学报(理工版) , 2015 , 32 ( 1 ): 25 - 31 .
CHEN G L , MAO R , CAI Y . High performance computing and related new technologies [J ] . Journal of Shenzhen University Science and Engineering , 2015 , 32 ( 1 ): 25 - 31 .
TOWNS J , GAITHER K , BLOOD P , et al . XSEDE:extreme science and engineering discovery environment(OAC 15-48562) [R ] . 2020 .
NEWHOUSE S . Seeking new horizons:EGI’s role in 2020(EGI-1098-D230-V3) [R ] . 2021 .
VILJOEN M , DUTKA Ł ,, KRYZA B , et al . Towards European open science commons:the EGI open data platform and the EGI DataHub [J ] . Procedia Computer Science , 2016 , 97 : 148 - 152 .
WRZESZCZ M , TRZEPLA K , SŁOTA R , , et al . Metadata organization and management for globalization of data access with onedata [C ] // Parallel Processing and Applied Mathematics . Cham:Springer , 2016 : 312 - 321 .
历军 . 高性能计算应用概览 [M ] . 北京 : 清华大学出版社 , 2018 : 304 - 307 .
LI J . Overview of high-performance computing applications [M ] . Beijing : Tsinghua University Press , 2018 : 304 - 307 .
XU Z W , CHI X B , XIAO N . Highperformance computing environment:a review of twenty years of experiments in China [J ] . National Science Review , 2016 , 3 ( 1 ): 36 - 48 .
秦广军 , 肖利民 , 张广艳 , 等 . 面向国家高性能计算环境的虚拟数据空间系统 [J ] . 大数据 , 2021 , 7 ( 2 ): 101 - 122 .
QIN G J , XIAO L M , ZHANG G Y , et al . Virtual data space system for national high-performance computing environment [J ] . Big Data Research , 2021 , 7 ( 2 ): 101 - 122 .
肖利民 , 宋尧 , 秦广军 , 等 . GVDS:面向广域高性能计算环境的虚拟数据空间 [J ] . 大数据 , 2021 , 7 ( 2 ): 123 - 146 .
XIAO L M , SONG Y , QIN G J , et al . GVDS:a global virtual data space for wide-area high-performance computing environments [J ] . Big Data Research , 2021 , 7 ( 2 ): 123 - 146 .
HEY T , TREFETHEN A . The data deluge:an e-science perspective [M ] // Grid computing:making the global infrastructure a reality . Chichester : John Wiley & Sons,Ltd , 2003 : 809 - 824 .
FREY J , TANNENBAUM T , LIVNY M , et al . Condor-G:a computation management agent for multi-institutional grids [C ] // Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing . Piscataway:IEEE , 2001 : 55 - 63 .
KOSAR T , BALMAN M . A new paradigm:data-aware scheduling in grid computing [J ] . Future Generation Computer Systems , 2009 , 25 ( 4 ): 406 - 413 .
ZHAO L P , YANG Y N , MUNIR A , et al . Optimizing geo-distributed data analytics with coordinated task scheduling and routing [J ] . IEEE Transactions on Parallel and Distributed Systems , 2020 , 31 ( 2 ): 279 - 293 .
WANG K , QIAO K , SADOOGHI I , et al . Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales [J ] . Concurrency and Computation:Practice and Experience , 2016 , 28 ( 1 ): 70 - 94 .
LI C L , BAI J P , TANG J H . Joint optimization of data placement and scheduling for improving user experience in edge computing [J ] . Journal of Parallel and Distributed Computing , 2019 , 125 : 93 - 105 .
HE L , QIAN Z C . Intent-based resource matching strategy in cloud [J ] . Information Sciences , 2020 , 538 : 1 - 18 .
BRYK P , MALAWSKI M , JUVE G , et al . Storage-aware algorithms for scheduling of workflow ensembles in clouds [J ] . Journal of Grid Computing , 2016 , 14 ( 2 ): 359 - 378 .
HU M L , LUO J , WANG Y , et al . Adaptive scheduling of task graphs with dynamic resilience [J ] . IEEE Transactions on Computers , 2017 , 66 ( 1 ): 17 - 23 .
尹伶艳 . 广域云环境下数据与计算的协同调度 [D ] . 天津:天津大学 , 2014 .
YIN L Y . Joint scheduling of data and computation in geo-distributed cloud systems [D ] . Tianjin:Tianjin University , 2014 .
SAIKIA S , BORDOLOI M . Molecular docking:challenges,advances and its use in drug discovery perspective [J ] . Current Drug Targets , 2019 , 20 ( 5 ): 501 - 521 .
0
浏览量
735
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621