[ "袁旭初(1995- ),男,东北大学计算机科学与工程学院硕士生,主要研究方向为分布式系统、并行计算等" ]
[ "付国(1996- ),男,东北大学计算机科学与工程学院硕士生,主要研究方向为分布式系统、并行计算等" ]
[ "毕继泽(1998- ),男,东北大学计算机科学与工程学院本科生,主要研究方向为大数据处理、并行与分布式计算等" ]
[ "张岩峰(1982- ),男,博士,东北大学计算机科学与工程学院教授,主要研究方向为大数据处理与挖掘、深度学习、并行与分布式计算等" ]
[ "聂铁铮(1980- ),男,博士,东北大学计算机科学与工程学院副教授,主要研究方向为大数据管理、数据集成与融合、区块链等" ]
[ "谷峪(1981- ),男,博士,东北大学计算机科学与工程学院教授,主要研究方向为大数据分析、分布式计算、时空和图数据管理等" ]
[ "鲍玉斌(1968- ),男,博士,东北大学计算机科学与工程学院教授,主要研究方向为商务智能、数据挖掘、大数据分析等" ]
[ "于戈(1962- ),男,博士,东北大学计算机学院教授、博士生导师,中国计算机学会会士。现任中国计算机学会信息系统专业委员会主任、数据库专业委员会委员、系统软件专业委员会委员,《计算机学报》《软件学报》《计算机研究与发展》等期刊编委。曾获得“教育部跨世纪人才基金”和“中国高校青年教师奖”。主要研究方向为分布式数据库系统、数据科学与大数据管理、区块链技术与应用等" ]
网络首发:2020-05,
纸质出版:2020-05-15
移动端阅览
袁旭初, 付国, 毕继泽, 等. 分布式数据流计算系统的数据缓存技术综述[J]. 大数据, 2020,6(3):2020027-1.
Xuchu YUAN, Guo FU, Jize BI, et al. Survey on data caching technology of distributed dataflow system[J]. Big Data Research, 2020, 6(3): 2020027-1.
袁旭初, 付国, 毕继泽, 等. 分布式数据流计算系统的数据缓存技术综述[J]. 大数据, 2020,6(3):2020027-1. DOI: 10.11959/j.issn.2096-0271.2020027.
Xuchu YUAN, Guo FU, Jize BI, et al. Survey on data caching technology of distributed dataflow system[J]. Big Data Research, 2020, 6(3): 2020027-1. DOI: 10.11959/j.issn.2096-0271.2020027.
数据流编程模型以其高度并行计算、支持流水线处理、支持函数式编程等优点被许多主流的计算系统采用。在分布式数据流系统和异构数据流系统中,算子之间数据生产和数据消化的速度不一致可能会导致数据堆积或者算子闲置等问题。为支持高效的数据流系统,需要设计缓存系统,以保证数据流的高效缓存和移动。选取了几个典型的分布式数据流系统与分布式消息队列系统进行系统分析,并总结了目前消息队列系统对数据流缓存系统的支持程度。最后对数据缓存技术进行了阐述,并分析了未来的数据流缓存系统的需求和研究方向。
Dataflow model is adopted by several dataflow systems for its advantages of high parallel computing
pipeline processing and functional programming.In distributed dataflow systems and heterogeneous dataflow systems
due to the speed mismatch between the data production of data source operators and the data consumption of data sink operators
data could be delayed and operators could be idle.In order to support an efficient dataflow system
a dataflow cache system was desired to ensure efficient caching and movement of dataflow.Several distributed dataflow systems and distributed message queuing systems were analyzed
and the support degree of current message queuing system to data flow caching system was summarized.Finally
the cache technique was introduced
and the demands and research directions of future dataflow caching systems were analyzed.
YAZDANPANAH F , ALVAREZ-MARTINEZ C , JIMENEZ-GONZALEZ D , et al . Hybrid dataflow/von-neumann architectures [J ] . IEEE Transactions on Parallel and Distributed Systems , 2014 , 25 ( 6 ): 1489 - 1509 .
NOWATZKI A , GANGADHAR V , SANKARALINGAM K . Computer with hybrid von-neumann/dataflow execution architecture:U.S.Patent 10,216,693 [J ] .2019-2-26.
DENNIS J B , . First version of a data flow procedure language [C ] // Programming Symposium . Heidelberg:Springer , 1974 : 362 - 376 .
DENNIS J B , FOSSEEN J B , LINDMAN J P . Data flow schemas [C ] // International Sympoisum on Theoretical Programming . Heidelberg:Springer , 1972 : 187 - 216 .
DENNIS J B . Fresh breeze:a multiprocessor chip architecture guided by modular programming principles [J ] . ACM SIGARCH Computer Architecture News , 2003 , 31 ( 1 ): 7 - 15 .
NAJJAR W A , LEE E A , GAO G R . Advances in the dataflow computational model [J ] . Parallel Computing , 1999 , 25 ( 13-14 ): 1907 - 1929 .
ZAHARIA M , CHOWDHURY M , FRANKLIN M J , et al . Spark:cluster computing with working sets [C ] // The 2nd USENIX Conference on Hot Topics in Cloud Computing . Berkeley:USENIX Association , 2010 :10.
MENG X , BRADLEY J , YAVUZ B , et al . Mllib:machine learning in Apache Spark [J ] . The Journal of Machine Learning Research , 2016 , 17 ( 1 ): 1235 - 1241 .
KARAU H , KONWINSKI A , WENDELL P , et al . Learning Spark:lightning-fast big data analysis [M ] . Sebastopol : O’Reilly Media,Inc.Press , 2015 .
CARBONE P , KATSIFODIMOS A , EWEN S , et al . Apache Flink:stream and batch processing in a single engine [J ] . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , 2015 , 36 ( 4 ): 28 - 38 .
FRIEDMAN E , TZOUMAS K . Introduction to Apache Flink:stream processing for real time and beyond [M ] . Sebastopol : O’Reilly Media,Inc.Press , 2016 .
ABADI M , BARHAM P , CHEN J , et al . TensorFlow:a system for large-scale machine learning [C ] // The 12th USENIX Symposium on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2016 : 265 - 283 .
ABADI M , AGARWAL A , BARHAM P , et al . TensorFlow:large-scale machine learning on heterogeneous distributed systems [J ] . Computer Science , 2016 ,arXiv:1603.04467.
WONGSUPHASAWAT K , SMILKOV D , WEXLEX J , et al . Visualizing dataflow graphs of deep learning models in TensorFlow [J ] . IEEE Transactions on Visualization and Computer Graphics , 2017 , 24 ( 1 ): 1 - 12 .
AKIDAU T , BRADSHAW R , CHAMBERS C , et al . The dataflow model:a practical approach to balancing correctness,latency,and cost in massive-scale,unbounded,out-of-order data processing [J ] . Proceedings of VLDB Endowment , 2015 , 8 ( 12 ): 1792 - 1803 .
倪炜 . 分布式消息中间件实践 [M ] . 北京 : 电子工业出版社 , 2018 .
NI W . Practice of distributed message middleware [M ] . Beijing : Publishing House of Electronics IndustryPress , 2018 .
NARKHEDE N , SHAPIRA G , PALINO T . Kafka:the definitive guide:real-time data and stream processing at scale [M ] . Sebastopol : O’Reilly Media,Inc.Press , 2017 .
郑奇煌 . Kafka技术内幕 [M ] . 北京 : 人民邮电出版社 , 2017 .
ZHENG Q H . Inside of Kafka technology [M ] . Beijing : The People’s Posts and Telecommunications PressPress , 2017 .
DOBBELAERE P , ESMAILI K S . Kafka versus RabbitMQ:a comparative study of two industry reference publish/subscribe implementations:industry paper [C ] // The 11th ACM International Conference . New York:ACM Press , 2017 : 227 - 238 .
AYANOGLU E , AYTAS Y , NAHUM D . Mastering RabbitMQ [M ] . Birmingham : Packt Publishing LtdPress , 2016 .
ROSTANSKI M , GROCHLA K , SEMAN A . Evaluation of highly available and fault-tolerant middleware clustered architectures using RabbitMQ [C ] // The 2014 Federated Conference on Computer Science and Information Systems . Piscataway:IEEE Press , 2014 : 879 - 884 .
IONESCU V M , . The analysis of the performance of RabbitMQ and ActiveMQ [C ] // Roedunet International Conference-Networking in Education &Research . Piscataway:IEEE Press , 2015 : 132 - 137 .
HENJES R , SCHLOSSER D , MENTH M , et al . Throughput performance of the ActiveMQ JMS server [M ] . Heidelberg : SpringerPress , 2007 .
丁威 , 周继锋 . RocketMQ技术内幕 [M ] . 北京 : 机械工业出版社 , 2018 .
DING W , ZHOU J F . Inside of RocketMQ technology [M ] . Beijing : China Machine PressPress , 2018 .
郭嘉凯 . RocketMQ:从阿里巴巴走向世界 [J ] . 软件和集成电路 , 2018 (11):13.
GUO J K . RocketMQ:from Alibaba to the world [J ] . Software and Integrated Circuit , 2018 (11):13.
李国杰 . 一种新的体系结构——数据流计算机 [J ] . 电子计算机动态 , 1981 ( 11 ): 3 - 10 .
LI G J . A new architecture:dataflow computer [J ] . Computer Review , 1981 ( 11 ): 3 - 10 .
ZUCKERMAN S , SUETTLERLEIN J , KANUERHASE R , et al . Using a Codelet program execution model for exascale machines:position paper [C ] // The 1st International Workshop on Adaptive SelfTuning Computing Systems for the Exaflop Era . New York:ACM Press , 2011 : 64 - 69 .
高光荣 . 大数据的流动之美——数据流与大数据:挑战与机遇 [J ] . 中国计算机学会通讯 , 2013 , 9 ( 12 ): 16 - 19 .
GAO G R . The flowing beauty of big data,dataflow and big data:challenge and opportunity [J ] . Communication of the CCF , 2013 , 9 ( 12 ): 16 - 19 .
NEUMEYER L , ROBBINS B , NAIR A , et al . S4:distributed stream computing platform [C ] // The 2010 IEEE International Conference on Data Mining Workshops . Piscataway:IEEE Press , 2010 : 170 - 177 .
SUETTLERLEIN J , ZUCKERMAN S , GAO G R . An implementation of the Codelet model [M ] . Heidelberg : SpringerPress , 2013 : 633 - 644 .
SUETTERLEIN J . Darts:a runtime based on the Codelet execution model [D ] . Newark:University of Delaware , 2014 .
LAUDERDALE C , KHAN R . Towards a Codelet-based runtime for exascale computing:position paper [C ] // The International Workshop on Adaptive Selftuning Computing Systems for the Exaflop Era . New York:ACM Press , 2012 : 21 - 26 .
DEAN J , CHEMAWAT S . MapReduce:simplified data processing on large clusters [J ] . Communications of the ACM , 2008 , 51 ( 1 ): 107 - 113 .
TOSHNIWAL A , TANEJA S , SHUKLA A , et al . Storm@twitter [C ] // The 2014 ACM SIGMOD International Conference on Management of Data . New York:ACM Press , 2014 : 147 - 156 .
李明 , 王晓鹏 . Storm源码分析 [M ] . 北京 : 人民邮电出版社 , 2014 .
LI M , WANG X P . Source code analysis of Storm [M ] . Beijing : The People’s Posts and Telecommunications PressPress , 2014 .
WU Y , ZHENG L , HEILIG B , et al . HAMR:a dataflow-based real-time in-memory cluster computing engine [J ] . The International Journal of High Performance Computing Applications , 2017 , 31 ( 5 ): 361 - 374 .
MURRAY D G , MCSHERRY F , ISAACS R , et al . Naiad:a timely dataflow system [C ] // The 24th ACM Symposium on Operating Systems Principles . New York:ACM Press , 2013 : 439 - 455 .
SHVACHKO K , KUANG H , RADIA S , et al . The Hadoop distributed file system [C ] // The 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies . Piscataway:IEEE Press , 2010 : 1 - 10 .
HUNT P , KONAR M , JUNQUEIRA F P , et al . ZooKeeper:wait-free coordination for internetscale systems [C ] // The 2010 USENIX Annual Technical Conference . Berkeley:USENIX Association , 2010 :11.
JUNQUEIRA F , REED B . Zookeeper:distributed process coordination [M ] . Sebastopol : O’Reilly Media,Inc.Press , 2013 .
吴璨 , 王小宁 , 肖海力 , 等 . 分布式消息系统研究综述 [J ] . 计算机科学 , 2019 ( B06 ): 1 - 5 .
WU C , WANG X N , XIAO H L , et al . Survey on distributed message system [J ] . Computer Science , 2019 ( B06 ): 1 - 5 .
0
浏览量
780
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621