[ "杨青霖(1995- ),男,清华大学计算机科学与技术系博士生,主要研究方向为分布式系统、纠删码存储。" ]
[ "吴桂勇(1992- ),男,清华大学计算机科学与技术系硕士生,主要研究方向为分布式系统、存储系统。" ]
[ "张广艳(1976- ),男,博士,清华大学计算机系长聘副教授、博士生导师,主要从事大数据存储与分析的理论和方法研究,包括大数据计算、存储系统与分布式处理等方面。研究得到了国家杰出青年科学基金项目、国家重点研发计划项目、国家973项目和国家863项目等的支持。近年来提出了大规模存储系统构建及访问的方法与关键技术,有效提高了存储系统的性能、扩展性和可用性。发表学术论文40余篇,其中在FAST、USENIX ATC、ACM TOS、IEEE TC、IEEE TPDS等计算机系统领域高水平国际会议和期刊发表论文20余篇。近五年以第一发明人获得美国发明专利授权1项、中国发明专利授权7项。" ]
网络首发:2021-03,
纸质出版:2021-03-15
移动端阅览
杨青霖, 吴桂勇, 张广艳. 分布式存储系统中的数据高效缓存方法[J]. 大数据, 2021,7(2):2021018.
Qinglin YANG, Guiyong WU, Guangyan ZHANG. An approach to buffering data efficiently in distributed storage systems[J]. Big data research, 2021, 7(2): 2021018.
杨青霖, 吴桂勇, 张广艳. 分布式存储系统中的数据高效缓存方法[J]. 大数据, 2021,7(2):2021018. DOI: 10.11959/j.issn.2096-0271.2021018.
Qinglin YANG, Guiyong WU, Guangyan ZHANG. An approach to buffering data efficiently in distributed storage systems[J]. Big data research, 2021, 7(2): 2021018. DOI: 10.11959/j.issn.2096-0271.2021018.
针对典型分布式存储系统存在的写放大、I/O路径过长、响应时延较高等问题,提出了一种基于SSD的分布式存储系统中数据高效缓存方法,采用读写旁路和懒惰缓存的缓存管理策略,以及兼顾最近访问时间和历史访问频率的缓存替换策略,并根据前台工作负载的变化情况,自适应地调整主动回刷脏数据的速率,显著提升了存储系统的读写性能。
To address the problems of write amplification
long I/O path
and high access latency in distributed storage systems
an efficient SSD-based caching approach for distributed storage systems was proposed.This approach adopts read/write bypassing and lazy caching methods to manage the cache system
considers last access time and historical access frequency when performing cache replacement
and adjusts the flushing speed according to the foreground workload.It improves significantly the reading and writing performance of storage systems.
陈游旻 , 李飞 , 舒继武 . 大数据环境下的存储系统构建:挑战、方法和趋势 [J ] . 大数据 , 2019 , 5 ( 4 ): 27 - 40 .
CHEN Y M , LI F , SHU J W . Building storage systems in big data era:challenges,methods and trends [J ] . Big Data Research , 2019 , 5 ( 4 ): 27 - 40 .
LEVENTHAL A . Flash storage memory [J ] . Communications of the ACM , 2008 , 51 ( 7 ): 47 - 51 .
PETERS M . NetApps solid state hierarchy [Z ] . ESG White Paper , 2009 .
KGIL T , ROBERTS D , MUDGE T . Improving NAND flash based disk caches [C ] // The 35th Annual International Symposium on Computer Architecture . Piscataway:IEEE Press , 2008 : 327 - 338 .
MESNIER M P , AKERS J B . Differentiated storage services [C ] // The 23rd ACM Symposium on Operating Systems Principles . New York:ACM Press , 2011 : 57 - 70 .
PRITCHETT T , THOTTETHODI M . SieveStore:a highly-selective,ensemblelevel disk cache for cost-performance [J ] . ACM SIGARCH Computer Architecture News , 2010 , 38 ( 3 ): 163 - 174 .
OH Y , CHOI J , LEE D , et al . Caching less for better performance:balancing cache size and update cost of flash memory cache in hybrid storage systems [C ] // The 10th USENIX Conference on File and Storage Technologies . New York:ACM Press , 2012 , 25 .
CANIM M , MIHAILA G A , BHATTACHARJEE B , et al . SSD bufferpool extensions for database systems [J ] . Proceedings of the VLDB Endowment , 2010 , 3 ( 1-2 ): 1435 - 1446 .
DO J , ZHANG D H , PATEL J M , et al . Turbocharging DBMS buffer pool using SSDs [C ] // The 2011 International Conference on Management of Data . New York:ACM Press , 2011 : 1113 - 1124 .
GUERRA J , PUCHA H , GLIDER J , et al . Cost effective storage using extent based dynamic tiering [C ] // The 9th USENIX Conference on File and Storage Technologies . Berkeley:USENIX Association , 2011 : 20 .
CHEN F , KOUFATY D A , ZHANG X D . Hystor:making the best use of solid state drives in high performance storage systems [C ] // The International Conference on Supercomputing . New York:ACM Press , 2011 : 22 - 32 .
APPUSWAMY R , VAN MOOLENBROEK D C , TANENBAUM A S . Integrating flash-based SSDs into the storage stack [C ] // 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies . Piscataway:IEEE Press , 2012 : 1 - 12 .
WANG A I A , REIHER P L , POPEK G J , et al . Con-quest:better performance through a disk/persistent-RAM hybrid file system [C ] // The 2002 USENIX Annual Technical Conference . New York:ACM Press , 2002 .
DEBNATH B , SENGUPTA S , LI J . ChunkStash:speeding up inline storage deduplication using flash memory [C ] // The 2010 USENIX Conference on USENIX Annual Technical Conference . Berkeley:USENIX Association , 2010 : 16 .
YANG Q , REN J . I-CASH:intelligently coupled array of SSD and HDD [C ] // 2011 IEEE 17th International Symposium on High Performance Computer Architecture . Piscataway:IEEE Press , 2011 : 278 - 289 .
WANG S C , LU Z Y , CAO Q , et al . BCW:buffer-controlled writes to HDDs for SSD-HDD hybrid storage server [C ] // The 2020 USENIX Conference on File and Storage Technologies . Berkeley:USENIX Association , 2020 : 253 - 266 .
LIU Z X , BAI Z H , LIU Z M , et al . DistCache:provable load balancing for large-scale storage systems with distributed caching [C ] // The 17th USENIX Conference on File and Storage Technologies . Berkeley:USENIX Association , 2019 : 143 - 157 .
ZHANG Y , HUANG P , ZHOU K , et al . OSCA:an online-model based cache allocation scheme in cloud block storage systems [C ] // The 2020 USENIX Conference on USENIX Annual Technical Conference . Berkeley:USENIX Association , 2020 : 785 - 798 .
BERGER D S , BERG B , ZHU T , et al . RobinHood:tail latency aware caching dynamic reallocation from cache-rich to cache-poor [C ] // The 13th USENIX Symposium on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2018 : 195 - 212 .
LUO T Q , AGGARWAL V , PELEATO B . Coded caching with distributed storage [J ] . IEEE Transactions on Information Theory , 2016 : 99 .
0
浏览量
627
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621