1. 北京航空航天大学计算机学院,北京 100191
2. 软件开发环境国家重点实验室,北京 100191
3. 北京联合大学智慧城市学院,北京 100101
4. 西安理工大学计算机科学与工程学院,陕西 西安 710048
[ "肖利民(1970- ),男,博士,北京航空航天大学计算机学院教授、博士生导师,计算机科学技术系主任,计算机系统结构研究所副所长,中国计算机学会(CCF)大数据专家委员会委员、高性能计算专业委员会常务委员、容错计算专业委员会委员,中国电子学会云计算专家委员会委员,国家计算机科学技术名词审定委员会委员,国家科技基础条件平台专家组成员,工业和信息化部电子科学技术委员会委员,中国工程院中国信息与电子工程科技发展战略研究中心专家委员会特聘专家。主要研究方向为计算机体系结构、计算机软件系统、高性能计算、云计算、虚拟化技术等。先后获得国家科技进步奖二等奖、北京市科学技术奖一等奖、中国科学院科技进步奖一等奖、原信息产业部信息产业重大技术发明奖、科技部国家重点新产品奖等国家级和省部级科技奖励。" ]
[ "宋尧(1994- ),男,北京航空航天大学计算机学院博士生,主要研究方向为高性能计算、分布式存储、分布式调度系统、存算联动调度等。" ]
[ "秦广军(1977- ),男,博士,北京联合大学智慧城市学院讲师,CCF会员,主要研究方向为高性能计算、存储系统、大数据和机器学习等。作为项目骨干参与了国家863计划项目、国家重点研发计划项目、国家自然科学基金项目、北京市自然科学基金项目等。" ]
[ "周汉杰(1995- ),男,北京航空航天大学计算机学院硕士生,主要研究方向为分布式文件系统、高性能计算、网络安全等。" ]
[ "王超波(1997- ),男,北京航空航天大学计算机学院硕士生,主要研究方向为分布式文件系统、高性能计算、软件工程等。" ]
[ "韦冰(1990- ),男,北京航空航天大学计算机学院博士生,主要研究方向为网络存储、数据容错、大数据处理、分布式计算等。" ]
[ "魏巍(1975-),男,博士,西安理工大学计算机科学与工程学院副教授,IEEE、CCF高级会员,FGCS、AHSWN、IEICE、KSII等期刊编委会成员,IEEE TPDS、TVT、TIP、TMC、TWC、JNCA和其他多个Elsevier期刊的定期审稿人。作为首席研究员和技术成员,主持了多项研究项目。主要研究方向为无线网络、无线传感器网络应用、图像处理、移动计算、分布式计算、普适计算、物联网、传感器数据云等。" ]
[ "霍志胜(1983- ),男,博士,北京航空航天大学计算机学院助理研究员,作为项目主持人和项目骨干,主持和参与了博士后基金面上项目、国家重点研发计划项目、国家自然科学基金面上项目等。主要研究方向为大数据存储、分布式存储系统、分布式/并行文件系统等。" ]
网络首发:2021-03,
纸质出版:2021-03-15
移动端阅览
肖利民, 宋尧, 秦广军, 等. GVDS:面向广域高性能计算环境的虚拟数据空间[J]. 大数据, 2021,7(2):2021017.
Limin XIAO, Yao SONG, Guangjun QIN, et al. GVDS: a global virtual data space for wide-area high-performance computing environments[J]. Big data research, 2021, 7(2): 2021017.
肖利民, 宋尧, 秦广军, 等. GVDS:面向广域高性能计算环境的虚拟数据空间[J]. 大数据, 2021,7(2):2021017. DOI: 10.11959/j.issn.2096-0271.2021017.
Limin XIAO, Yao SONG, Guangjun QIN, et al. GVDS: a global virtual data space for wide-area high-performance computing environments[J]. Big data research, 2021, 7(2): 2021017. DOI: 10.11959/j.issn.2096-0271.2021017.
广域高性能计算环境是支撑科技创新和社会经济发展的核心信息基础设施。然而,在广域高性能计算环境中,异构存储资源在地理位置上的分散分布导致无法发挥广域存储资源的聚合效应,难以满足对广域分布数据的统一管理和高效访问需求。为此,提出了虚拟数据空间构建方法及数据访问性能优化方法,并实现了一个面向广域高性能计算环境的全局虚拟数据空间(GVDS)。GVDS可聚合广域分布的异构存储资源,形成统一的虚拟数据空间,有效支撑用户以统一访问模式高效访问广域分散的异构存储资源,实现广域环境中分布数据的跨域共享和协同处理。测试结果表明,与国际领先的面向广域高性能计算环境的OneData、GFFS等存储系统相比,GVDS实现了相当的功能,且数据访问性能明显提升。
The wide-area high-performance computing environment is the core information infrastructure to support technology innovation
economic development
and national defense.However
heterogeneous storage resources are geographically distributed in wide-area high-performance computing environments
resulting in the barriers between applications and data.The requirements of unified data management and efficient data access cannot be met.A method of establishing virtual data space and a data access optimization method was presented
and a global virtual data space (GVDS) for wide-area high-performance computing environments to satisfy the requirements was implemented.GVDS aggregates geographically distributed and heterogeneous storage resources
creating a unified virtual data space to provide unified and efficient data access.Sharing and collaborative processing of geographically distributed data were achieved in widearea environments.The experimental results indicate that compared with the state-of-the-art wide-area storage system in the field of high-performance computing
such as OneData and GFFS
GVDS has similar functions and improves the read bandwidth significantly.
SLOTA R , DUTKA L , KRYZA B , et al . Storage systems for organizationally distributed environments-PLGrid plus case study [C ] // International Conference on Parallel Processing and Applied Mathematics . Heidelberg:Springer , 2014 : 724 - 733 .
XIE X , XIAO N , XU Z W , et al . CNGrid software 2:service oriented approach to grid computing [C ] // The UK e-Science All Hands Meeting .[S.l.:s.n. ] , 2005 : 701 - 708 .
DEPEI Q , . CNGrid:a test-bed for grid technologies in China [C ] // The 10th IEEE International Workshop on Future Trends of Distributed Computing Systems . Piscataway:IEEE Press , 2004 : 135 - 139 .
Cluster File Systems,Inc . Lustre:a scalable,high-performance file system [Z ] . 2002 .
WEIL S A , BRANDT S A , MILLER E L , et al . Ceph:a scalable,high-performance distributed file system [C ] // The 7th Symposium on Operating Systems Design and Implementation . New York:ACM Press , 2006 : 307 - 320 .
SCHMUCK F B , HASKIN R L . GPFS:a shared-disk file system for large computing clusters [C ] // The Conference on File and Storage Technologies . New York:ACM Press , 2002 : 231 - 244 .
DABEK F , KAASHOEK M F , KARGER D , et al . Wide-area cooperative storage with CFS [C ] // The 18th ACM Symposium on Operating Systems Principles . New York:ACM Press , 2001 : 202 - 215 .
DABEK F , BRUNSKILL E , KAASHOEK M F , et al . Building peer-to-peer systems with chord,a distributed lookup service [C ] // The 8th Workshop on Hot Topics in Operating Systems . Piscataway:IEEE Press , 2001 : 81 - 86 .
GRAFFI K , GROSS C , STINGL D , et al . Lifesocial.KOM:a secure and p2p-based solution for online social networks [C ] // 2011 IEEE Consumer Communications and Networking Conference . Piscataway:IEEE Press , 2011 : 554 - 558 .
CHAN Y W , HO T H , SHIH P C , et al . Malugo:a peer-to-peer storage system [J ] . International Journal of Ad Hoc and Ubiquitous Computing , 2010 , 5 ( 4 ): 209 .
TOKA L , DELL’AMICO M , MICHIARDI P . Data transfer scheduling for P2P storage [C ] // 2011 IEEE International Conference on Peer-to-Peer Computing . Piscataway:IEEE Press , 2011 : 132 - 141 .
SHEN H Y , LI Z , LI J . A DHT-aided chunkdriven overlay for scalable and efficient peer-to-peer live streaming [J ] . IEEE Transactions on Parallel and Distributed Systems , 2012 , 24 ( 11 ): 2125 - 2137 .
KARGER D , LEHMAN E , LEIGHTON T , et al . Consistent hashing and random trees:distributed caching protocols for relieving hot spots on the World Wide Web [C ] // The 29th Annual ACM Symposium on Theory of Computing . New York:ACM Press , 1997 : 65 - 663 .
CALDER B , WANG J , OGUS A , et al . Windows Azure storage:a highly available cloud storage service with strong consistency [C ] // The 23rd ACM Symposium on Operating Systems Principles . New York:ACM Press , 2011 : 143 - 157 .
SIMMS C K , PIKE G G , BALOG D . Wide area filesystem performance using Lustre on the TeraGrid [R ] . 2007 .
TATEBE O , HIRAGA K , SODA N . Gfarm grid file system [J ] . New Generation Computing , 2010 , 28 : 257 - 275 .
THOMSON A , ABADI D J . CalvinFS:consistent WAN replication and scalable metadata management for distributed file systems [C ] // The 13th USENIX Conference on File and Storage Technologies . New York:ACM Press , 2015 : 1 - 14 .
WRZESZCZ M , TRZEPLA K , SOTA R , et al . Metadata organization and management for globalization of data access with OneData [C ] // International Conference on Parallel Processing and Applied Mathematics . Cham:Springer , 2015 : 312 - 321 .
DUTKA U , SOTA R , WRZESZCZ M , et al . Uniform and efficient access to data in organizationally distributed environments [M ] // eScience on Distributed Computing Infrastructure . Cham : Springer , 2014 : 178 - 194 .
GRIMSHAW A , MORGAN M , KALYANARAMAN A . GFFS-the XSEDE global federated file system [J ] . Parallel Processing Letters , 2013 , 23 ( 2 ): 1340005 .
胡正丁 , 薛巍 . 面向异构众核超级计算机的大规模稀疏计算性能优化研究 [J ] . 大数据 , 2020 , 6 ( 4 ): 40 - 55 .
HU Z D , XUE W . Research on performance optimization for largescale sparse computation over many-core heterogenous supercomputer [J ] . Big Data Research , 2020 , 6 ( 4 ): 40 - 55 .
KUNSZT P , BADINO P , FROHNER A , et al . Data storage,access and catalogs in gLite [C ] // 2005 IEEE International Symposium on Mass Storage Systems and Technology . Piscataway:IEEE Press , 2005 : 166 - 170 .
FOSTER I , CZAJKOWSKI K , FERGUSON D E , et al . Modeling and managing state in distributed systems:The role of OGSI and WSRF [J ] . Proceedings of the IEEE , 2005 , 93 ( 3 ): 604 - 612 .
CHERVENAK A , FOSTER I , KESSELMAN C , et al . The data grid:towards an architecture for the distributed management and analysis of large scientific datasets [J ] . Journal of Network and Computer Applications , 2000 , 23 ( 3 ): 187 - 200 .
ALLCOCK W , BRESNAHAN J , BESTER J . GridFTP protocol specification [Z ] . 2002 .
FITZGERALD S , FOSTER I , KESSELMAN C , et al . A directory service for configuring high-performance distributed computations [C ] // The 6th IEEE International Symposium on High Performance Distributed Computing . Piscataway:IEEE Press , 1997 : 365 - 375 .
CHEN M , BANGERA G B , HILDEBRAND D , et al . vNFS:maximizing NFS performance with compounds and vectorized I/O [J ] . ACM Transactions on Storage , 2017 , 13 ( 3 ): 1 - 24 .
RYU J , LEE D , SHIN K G , et al . ClusterFetch:a lightweight prefetcher for intensive disk reads [J ] . IEEE Transactions on Computers , 2017 , 67 ( 2 ): 284 - 290 .
WU Z , BUTKIEWICZ M , PERKINS D , et al . SPANStore:cost-effective georeplicated storage spanning multiple cloud services [C ] // The 24th ACM Symposium on Operating Systems Principles . New York:ACM Press , 2013 : 292 - 308 .
SHAO Y , LI C , TANG H . A data replica placement strategy for IoT workflows in collaborative edge and cloud environm ents [J ] . Computer Networks , 2019 , 148 : 46 - 59 .
CHANG W C , WANG P C . Write-aware replica placement for cloud computing [J ] . IEEE Journal on Selected Areas in Commun ications , 2019 , 37 ( 3 ): 656 - 667 .
The reference implementation of the linux FUSE (filesystem in userspace) interface [Z ] . 2019 .
CHOPADE R , DHAVASE N S . MongoDB,Couchbase:performance comparison for image dataset [C ] // 2017 2nd International Conference for Convergence in Technology . Piscataway:IEEE Press , 2017 : 255 - 258 .
A network filesystem client to connect to SSH servers [Z ] . 2019 .
SHEPLER S , NOVECK D , EISLER M . Network file system (NFS) version 4 minor version 1 protocol [Z ] . 2010 .
Fio-flexible I/O tester rev.3.16 [Z ] . 2019 .
0
浏览量
754
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621