1. 复旦大学软件学院,上海 201203
2. 上海市数据科学重点实验室,上海 201203
[ "林玉哲(1996- ),男,复旦大学软件学院硕士生,主要研究方向为GPU、并行计算、事务性内存等" ]
[ "张为华(1974- ),男,复旦大学软件学院教授,主要研究方向为编译、体系结构、并行计算、系统软件等" ]
网络首发:2020-07,
纸质出版:2020-07-15
移动端阅览
林玉哲, 张为华. GPU事务性内存技术研究[J]. 大数据, 2020,6(4):2020029-1.
Yuzhe LIN, Weihua ZHANG. A research on GPU transactional memory[J]. Big Data Research, 2020, 6(4): 2020029-1.
林玉哲, 张为华. GPU事务性内存技术研究[J]. 大数据, 2020,6(4):2020029-1. DOI: 10.11959/j.issn.2096-0271.2020029.
Yuzhe LIN, Weihua ZHANG. A research on GPU transactional memory[J]. Big Data Research, 2020, 6(4): 2020029-1. DOI: 10.11959/j.issn.2096-0271.2020029.
GPU是并行计算领域重要的体系结构之一,然而在面对高数据竞争的场景时,程序员往往需要设计复杂的并行方案。为了简化这一过程,GPU事务性内存实现了复杂的数据同步和并行,对外则仅提供简单的API。首先介绍了GPU事务性内存的研究背景。其次,讨论了近年的GPU事务性内存的设计方案与策略,分析了不同设计方案遇到的问题和解决方案,包括硬件和软件上的实现。最后对GPU事务性内存的现状和未来的发展做出了总结和展望。
GPU is one of the important architectures in parallel computing
however
when dealing with high data racing scenarios
programmers often need to design complex parallel schemes.In order to simplify this process
GPU transactional memory implements complex data synchronization and parallelism
and only provides simple API.The research background of GPU transactional memory was introduced.Then
the designs and strategies of GPU transactional memory in recent years were discussed
and the problems and solutions of different designs were analyzed
including the implementation of hardware and software.Finally
the current situation and future development of GPU transactional memory were summarized and prospected.
NVIDIA . CUDA C++ programming guide [Z ] . 2020 .
YAN Z F , LIN Y Z , PENG L , et al . Harmonia:a high throughput B+tree for GPUS [C ] // The 24th Symposium on Principles and Practice of Parallel Programming . New York:ACM Press , 2019 : 133 - 144 .
SHAHVARANI A , JACOBSEN H A . A hybrid B+tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms [C ] // The 2016 International Conference on Management of Data.[S.l.:s.n] . 2016 : 1523 - 1538
KRZYSZTOF K , . B+-tree optimized for GPGPU [C ] // OTM Confederated International Conferences.[S.l] : Springer , 2012 : 843 - 854 .
JORDAN F , ANDREW W , KEVIN S . Accelerating braided B+tree searches on a GPU with CUDA [C ] // The 2nd Workshop on Applications for Multi and Many Core Processors:Analysis,Implementation,and Performance.[S.l.:s.n] . 2011 : 1 - 11 .
ZHANG W H , YAN Z F , LIN Y Z , et al . A high throughput B+tree for SIMD architectures [J ] . IEEE Transaction on Parallel and Distributed Systems , 2020 , 31 ( 3 ): 707 - 720 .
HERLIHY M , ELIOT J , MOSS B . Transactional memory:architectural support for lock-free data structures [C ] // The 20th Annual International Symposium on Computer Architecture.[S.l.:s.n] . 1993 : 289 - 300 .
SHAVIT N , TOUITOU D . Software transactional memory [C ] // The 14th ACM Symposium on Principles of Distributed Computing . New York:ACM Press , 1995 : 204 - 213 .
LOMET D B , . Process structuring,synchronization,and recovery using atomic actions [C ] // The ACM Conference on Language Design for Reliable Software . New York:ACM Press , 1977 : 128 - 137 .
HARRIS T , ADRIÁN C , et al . Transactional memory:an overview [J ] . IEEE Micro , 2007 , 27 ( 3 ): 8 - 29 .
WANG X , ZHANG W H , WANG Z G , et al . Eunomia:scaling concurrent searchtrees under contention using HTM [C ] // The 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . New York:ACM Press , 2017 : 385 - 399 .
CEDERMAN D , TSIGAS P , CHAUDHRY M T . Towards a software transactional memory for graphics processors [C ] // Eurographics Conference on Parallel Graphics &Visualization.[S.l.:s.n] . 2010 : 121 - 129 .
XU Y L , WANG R , GOSWAMI N , et al . Software transactional memory for GPU architectures [J ] . Computer Architecture Letters , 2014 , 13 ( 1 ): 49 - 52 .
SHEN Q , SHARP C , BLEWITT W , et al . Priority rule based software transactions for the GPU [C ] // The European Conference on Parallel Processing.[S.l.:s.n] , 2015 : 361 - 372 .
HOLEY A , ZHAI A . Lightweight software transactions on GPUs [C ] // The 43rd International Conference on Parallel Processing.[S.l.:s.n] , 2014 : 461 - 470 .
AWAD M A , ASHKIANI S , JOHNSON R , et al . Engineering a high-performance GPU B-tree [C ] // The 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel.[S.l.:s.n] , 2019 : 145 - 157 .
FUNG W W L , SINGH I , BROWNSWORD A , et al . KILO TM:hardware transactional memory for GPU architectures [J ] . IEEE Micro , 2012 , 32 ( 3 ): 7 - 16 .
FUNG W W L , AAMODTT M . Energy efficient GPU transactional memory via space-time optimizations [C ] // The 46th Annual International Symposium on Microarchitecture . New York:ACM Press , 2013 : 408 - 420 .
SUI C , LU P , SAMUEL I . Accelerating GPU hardware transactional memory with snapshot isolation [C ] // The 44th Annual International Symposium on Computer Architecture . New York:ACM Press , 2017 : 282 - 294 .
FELBER P , FETZER C , RIEGEL T . Dynamic performance tuning of word based software transactional memory [C ] // The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . New York:ACM Press , 2008 : 237 - 246 .
FUNG W W L , SHAM I , YUAN G L , et al . Dynamic warp formation:efficient MIMD control flow on SIMD graphics hardware [J ] . ACM Transactions on Architecture and Code Optimization , 2009 , 6 ( 2 ).
BAKHODA A , YUAN G L , FUNG W W L , et al . Analyzing CUDA workloads using a detailed GPU simulator [C ] // 2009 IEEE International Symposium on Performance Analysis of Systems and Software . Piscataway:IEEE Press , 2009 : 163 - 174 .
0
浏览量
645
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621