1. 国家并行计算工程技术研究中心,北京 100080
2. 复旦大学计算机科学技术学院,上海 200433
[ "何晓斌(1984- ),男,国家并行计算工程技术研究中心助理研究员,主要研究方向为超大规模存储系统、新型存储软件协议栈等技术" ]
[ "蒋金虎(1974- ),男,复旦大计算机科学技术学院高级工程师,主要研究方向为操作系统、分布式存储" ]
网络首发:2020-07,
纸质出版:2020-07-15
移动端阅览
何晓斌, 蒋金虎. 面向大数据异构系统的神威并行存储系统[J]. 大数据, 2020,6(4):2020031-1.
Xiaobin HE, Jinhu JIANG. Sunway parallel storage system for big data heterogeneous system[J]. Big Data Research, 2020, 6(4): 2020031-1.
何晓斌, 蒋金虎. 面向大数据异构系统的神威并行存储系统[J]. 大数据, 2020,6(4):2020031-1. DOI: 10.11959/j.issn.2096-0271.2020031.
Xiaobin HE, Jinhu JIANG. Sunway parallel storage system for big data heterogeneous system[J]. Big Data Research, 2020, 6(4): 2020031-1. DOI: 10.11959/j.issn.2096-0271.2020031.
随着大数据应用和传统高性能计算应用的融合以及异构计算的引入,传统面向高性能计算的并行存储系统面临着异构计算I/O支持差、性能干扰和效率低等问题。通过在系统架构引入多层次存储架构、设计缓存映射机制来减轻I/O负载。在转发服务层,调整I/O转发策略,均衡I/O负载。在后端存储层,对系统高可用功能进行调整,解决大数据I/O访问模式与原有高可用措施的冲突。经过优化设计和完善后的并行存储系统更好地适应了异构众核架构,使得某些应用获得了10倍以上的I/O性能提升。
With the integration of big data applications and traditional high-performance computing applications and the introduction of heterogeneous computing
the traditional parallel storage system for high-performance computing faces the problems of poor I/O support
performance interference
and low efficiency.By introducing multi-level storage architecture into the system architecture
the cache mapping mechanism was designed to reduce the I/O load.The I/O forwarding strategy was adjusted in the forwarding service layer to balance the I/O load.In the back-end storage layer
the high availability function of the system was adjusted to solve the conflict between the big data I/O access mode and the original high availability functions.After optimized design and improvement
the parallel storage system can better adapt to the heterogeneous multi-core architecture
making some applications get more than 10 times of I/O performance improvement.
YAN Z F , LIN Y Z , PENG L , et al . Harmonia:a high throughput B+tree for GPUs [C ] // The 24th Symposium on Principles and Practice of Parallel Programming.[S.l.:s.n] . 2019 .
ZHANG W , YAN Z , LIN Y , et al . A high throughput B+tree for SIMD architectures [J ] . IEEE Transactions on Parallel and Distributed Systems , 2020 , 31 ( 3 ): 707 - 720 .
WANG X , ZHANG W H , WANG Z G , et al . Eunomia:scaling concurrent search trees under contention using HTM [J ] . ACM SIGPLAN Notices , 2017 , 52 ( 8 ): 385 - 399 .
ANDERSON M , SMITH S , SUNDARAM N , et al . Bridging the gap between HPC and big data frameworks [J ] . Proceedings of the VLDB Endowment , 2017 , 10 ( 8 ): 901 - 912 .
XUAN P F , DENTON J , LUO F , et al . Big data analytics on traditional HPC infrastructure using two-level storage [C ] // The 2015 International Workshop on Data-Intensive Scalable Computing Systems.[S.l.:s.n . ] , 2015 .
PAUL A K , GOYAL A , WANG F Y , et al . I/O load balancing for big data HPC applications [C ] // 2017 IEEE International Conference on Big Data . Piscataway:IEEE Press , 2017 .
ISLAM N S , SHANKAR D , LU X Y , et al . Accelerating I/O performance of big data analytics on HPC clusters through RDMA-based key-value store [C ] // The 44th International Conference on Parallel Processing . Piscataway:IEEE Press , 2015 .
QIU J , JHA S , LUCKOW A , et al . Towards HPC-ABDS:an initial high-performance big data stack [J ] . ACM , 2014 , 1 ( 1 ): 1 - 22 .
XUAN P F , LIGON W B , SRIMANI P K , et al . Accelerating big data analytics on HPC clusters using two-level storage [J ] . Parallel Computing , 2017 , 61 ( 1 ): 18 - 34 .
朱传家 , 刘鑫 , 方佳瑞 . 基于“神威·太湖之光”的Caffe分布式扩展研究 [J ] . 计算机应用与软件 , 2020 , 37 ( 1 ): 15 - 20 .
ZHU C J , LIU X , FANG J R . Distributed optimization study for Caffe on Sunway TaihuLight supercomputer [J ] . Computer Applications and Software , 2020 , 37 ( 1 ): 15 - 20 .
FU H H , LIAO J F , YANG J Z , et al . The Sunway TaihuLight supercomputer:system and applications [J ] . Science China (Information Sciences) , 2016 , 59 ( 7 ):072001.
CHEN Q , CHEN K , CHEN Z N , et al . Lessons learned from optimizing the Sunway storage system for higher application I/O performance [J ] . Journal of Computer Science and Technology , 2020 , 35 ( 1 ): 47 - 60 .
WU Y , RODRÃGUEZ J , BOURILKOV D , et al . Utilizing Lustre file system with DCache for CMS analysis [J ] . Journal of Physics:Conference Series , 2010 , 219 :062068.
HEBENSTREIT M . Performance evaluation of Intel ® SSD-Based Lustre*Cluster file systems at the Intel ® CRTDC [J ] . 2010 .
KOO D , KIM J S , HWANG S , et al . Utilizing progressive file layout leveraging SSDs in HPC cloud environments [C ] // 2016 IEEE 1st International Workshops on Foundations and Applications of Self*Systems . Piscataway:IEEE Press , 2016 .
XIN L , LU Y , LU Y T , et al . masFS:file system based on memory and SSD in compute nodes for high performance computers [C ] // IEEE International Conference on Parallel & Distributed Systems . Piscataway:IEEE Press , 2016 .
CHEN S , LIU L , ZHANG W H , et al . Architectural support for NVRAM persistence in GPUs [J ] . IEEE Transactions on Parallel and Distributed Systems , 2019 ( 219 ):1.
MCKUSICK M K , QUINLAN S . GFS:evolution on fast-forward [J ] . Queue , 2009 , 7 ( 7 ): 10 - 20 .
CHEN M , MAO S W , ZHANG Y , et al . Big data storage [M ] . Heidelberg : SpringerPress , 2014 : 33 - 49 .
WEETS J F , KAKHANI M K , KUMAR A . Limitations and challenges of HDFS and MapReduce [C ] // 2015 International Conference on Green Computing and Internet of Things . Piscataway:IEEE Press , 2015 .
ANDREU-PEREZ J , CAO F , HAGRAS H , et al . A self-adaptive online brainmachine interface of a humanoid robot through a general type-2 fuzzy inference system [J ] . IEEE Transactions on Fuzzy Systems , 2018 , 26 ( 1 ): 101 - 116 .
XIE J , MENG F J , WANG H L , et al . Research on scheduling scheme for Hadoop clusters [C ] // The 2nd International Conference on Computer and Applications.[S.l.:s.n] . 2013 : 49 - 52 .
YU W K , WANG Y D , QUE X Y , et al . Virtual shuffling for efficient data movement in MapReduce [J ] . IEEE Transactions on Computers , 2015 , 64 ( 2 ): 556 - 568 .
何晓斌 , 蒋金虎 , 魏巍 , 等 . 神威蓝光计算机轻量级文件系统LWFS的优化和测试 [J ] . 高性能计算技术 , 2012 ( 5 ): 41 - 45 .
HE X B , JIANG J H , WEI W , et al . Accelerating and evaluation of LWFS in Sunway BlueLight computer system [J ] . High Performance Computing Technology , 2012 ( 5 ): 41 - 45 .
VISHWANATH V , HERELD M , ISKRA K , et al . Accelerating I/O forwarding in IBM blue gene/p systems [C ] // The 2010 ACM/IEEE International Conference for High Performance Computing,Networking,Storage and Analysis . Piscataway:IEEE Press , 2010 .
JI X , YANG B , ZHANG T Y , et al . Automatic,application-aware I/O forwarding resource allocation [C ] // The 17th USENIX Conference on File and Storage Technologies.[S.l.:s.n] . 2019 .
BRAAM P . The Lustre storage architecture [J ] . Computer Science , 2019 ,arXiv:1903.01955.
0
浏览量
856
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621