GPU事务性内存技术研究

林玉哲; 张为华

doi:10.11959/j.issn.2096-0271.2020029

您当前的位置：

首页 >

文章列表页 >

GPU事务性内存技术研究

专题：大数据异构并行系统 | 更新时间：2024-06-03

- GPU事务性内存技术研究
- A research on GPU transactional memory
- 大数据 2020年6卷第4期页码：2020029-1
- 作者机构：
  
  1. 复旦大学软件学院，上海 201203
  2. 上海市数据科学重点实验室，上海 201203
- 作者简介：
  
  [ "林玉哲（1996- ），男，复旦大学软件学院硕士生，主要研究方向为GPU、并行计算、事务性内存等" ]
  [ "张为华（1974- ），男，复旦大学软件学院教授，主要研究方向为编译、体系结构、并行计算、系统软件等" ]
- 基金信息：
- DOI：10.11959/j.issn.2096-0271.2020029
  中图分类号： TP302
- 网络首发：2020-07，
  
  纸质出版：2020-07-15
- 稿件说明：
移动端阅览
林玉哲, 张为华. GPU事务性内存技术研究[J]. 大数据, 2020,6(4):2020029-1.

Yuzhe LIN, Weihua ZHANG. A research on GPU transactional memory[J]. Big Data Research, 2020, 6(4): 2020029-1.
林玉哲, 张为华. GPU事务性内存技术研究[J]. 大数据, 2020,6(4):2020029-1. DOI： 10.11959/j.issn.2096-0271.2020029.

Yuzhe LIN, Weihua ZHANG. A research on GPU transactional memory[J]. Big Data Research, 2020, 6(4): 2020029-1. DOI： 10.11959/j.issn.2096-0271.2020029.

摘要

GPU是并行计算领域重要的体系结构之一，然而在面对高数据竞争的场景时，程序员往往需要设计复杂的并行方案。为了简化这一过程，GPU事务性内存实现了复杂的数据同步和并行，对外则仅提供简单的API。首先介绍了GPU事务性内存的研究背景。其次，讨论了近年的GPU事务性内存的设计方案与策略，分析了不同设计方案遇到的问题和解决方案，包括硬件和软件上的实现。最后对GPU事务性内存的现状和未来的发展做出了总结和展望。

Abstract

GPU is one of the important architectures in parallel computing

however

when dealing with high data racing scenarios

programmers often need to design complex parallel schemes.In order to simplify this process

GPU transactional memory implements complex data synchronization and parallelism

and only provides simple API.The research background of GPU transactional memory was introduced.Then

the designs and strategies of GPU transactional memory in recent years were discussed

and the problems and solutions of different designs were analyzed

including the implementation of hardware and software.Finally

the current situation and future development of GPU transactional memory were summarized and prospected.

关键词

Keywords

references

NVIDIA . CUDA C++ programming guide [Z ] . 2020 .

YAN Z F , LIN Y Z , PENG L , et al . Harmonia:a high throughput B+tree for GPUS [C ] // The 24th Symposium on Principles and Practice of Parallel Programming . New York:ACM Press , 2019 : 133 - 144 .

SHAHVARANI A , JACOBSEN H A . A hybrid B+tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms [C ] // The 2016 International Conference on Management of Data.[S.l.:s.n] . 2016 : 1523 - 1538

KRZYSZTOF K , . B+-tree optimized for GPGPU [C ] // OTM Confederated International Conferences.[S.l] : Springer , 2012 : 843 - 854 .

JORDAN F , ANDREW W , KEVIN S . Accelerating braided B+tree searches on a GPU with CUDA [C ] // The 2nd Workshop on Applications for Multi and Many Core Processors:Analysis,Implementation,and Performance.[S.l.:s.n] . 2011 : 1 - 11 .

ZHANG W H , YAN Z F , LIN Y Z , et al . A high throughput B+tree for SIMD architectures [J ] . IEEE Transaction on Parallel and Distributed Systems , 2020 , 31 ( 3 ): 707 - 720 .

HERLIHY M , ELIOT J , MOSS B . Transactional memory:architectural support for lock-free data structures [C ] // The 20th Annual International Symposium on Computer Architecture.[S.l.:s.n] . 1993 : 289 - 300 .

SHAVIT N , TOUITOU D . Software transactional memory [C ] // The 14th ACM Symposium on Principles of Distributed Computing . New York:ACM Press , 1995 : 204 - 213 .

LOMET D B , . Process structuring,synchronization,and recovery using atomic actions [C ] // The ACM Conference on Language Design for Reliable Software . New York:ACM Press , 1977 : 128 - 137 .

HARRIS T , ADRIÁN C , et al . Transactional memory:an overview [J ] . IEEE Micro , 2007 , 27 ( 3 ): 8 - 29 .

WANG X , ZHANG W H , WANG Z G , et al . Eunomia:scaling concurrent searchtrees under contention using HTM [C ] // The 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . New York:ACM Press , 2017 : 385 - 399 .

CEDERMAN D , TSIGAS P , CHAUDHRY M T . Towards a software transactional memory for graphics processors [C ] // Eurographics Conference on Parallel Graphics ＆Visualization.[S.l.:s.n] . 2010 : 121 - 129 .

XU Y L , WANG R , GOSWAMI N , et al . Software transactional memory for GPU architectures [J ] . Computer Architecture Letters , 2014 , 13 ( 1 ): 49 - 52 .

SHEN Q , SHARP C , BLEWITT W , et al . Priority rule based software transactions for the GPU [C ] // The European Conference on Parallel Processing.[S.l.:s.n] , 2015 : 361 - 372 .

HOLEY A , ZHAI A . Lightweight software transactions on GPUs [C ] // The 43rd International Conference on Parallel Processing.[S.l.:s.n] , 2014 : 461 - 470 .

AWAD M A , ASHKIANI S , JOHNSON R , et al . Engineering a high-performance GPU B-tree [C ] // The 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel.[S.l.:s.n] , 2019 : 145 - 157 .

FUNG W W L , SINGH I , BROWNSWORD A , et al . KILO TM:hardware transactional memory for GPU architectures [J ] . IEEE Micro , 2012 , 32 ( 3 ): 7 - 16 .

FUNG W W L , AAMODTT M . Energy efficient GPU transactional memory via space-time optimizations [C ] // The 46th Annual International Symposium on Microarchitecture . New York:ACM Press , 2013 : 408 - 420 .

SUI C , LU P , SAMUEL I . Accelerating GPU hardware transactional memory with snapshot isolation [C ] // The 44th Annual International Symposium on Computer Architecture . New York:ACM Press , 2017 : 282 - 294 .

FELBER P , FETZER C , RIEGEL T . Dynamic performance tuning of word based software transactional memory [C ] // The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . New York:ACM Press , 2008 : 237 - 246 .

FUNG W W L , SHAM I , YUAN G L , et al . Dynamic warp formation:efficient MIMD control flow on SIMD graphics hardware [J ] . ACM Transactions on Architecture and Code Optimization , 2009 , 6 ( 2 ).

BAKHODA A , YUAN G L , FUNG W W L , et al . Analyzing CUDA workloads using a detailed GPU simulator [C ] // 2009 IEEE International Symposium on Performance Analysis of Systems and Software . Piscataway:IEEE Press , 2009 : 163 - 174 .

浏览量

645

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

数据流技术在GPU和大数据处理中的应用

基于关联规则的气象服务智能推荐