[ "邹骁锋(1996- ),男,湖南大学信息科学与工程学院博士生,主要研究方向为并行计算、数据挖掘和机器学习" ]
[ "阳王东(1974- ),男,湖南大学信息科学与工程学院教授,主要研究方向为分布式并行计算、机器学习" ]
[ "容学成(1996- ),男,湖南大学信息科学与工程学院硕士生,主要研究方向为大数据和机器学习" ]
[ "李肯立(1970- ),男,博士,湖南大学信息科学与工程学院教授,主要研究方向为高性能计算、人工智能和大数据" ]
[ "李克勤(1963- ),男,博士,湖南大学信息科学与工程学院教授,主要研究方向为并行计算、边缘计算和大数据" ]
网络首发:2020-05,
纸质出版:2020-05-15
移动端阅览
邹骁锋, 阳王东, 容学成, 等. 面向大数据处理的数据流编程模型和工具综述[J]. 大数据, 2020,6(3):2020024-1.
Xiaofeng ZOU, Wangdong YANG, Xuecheng RONG, et al. A survey of dataflow programming models and tools for big data processing[J]. Big Data Research, 2020, 6(3): 2020024-1.
邹骁锋, 阳王东, 容学成, 等. 面向大数据处理的数据流编程模型和工具综述[J]. 大数据, 2020,6(3):2020024-1. DOI: 10.11959/j.issn.2096-0271.2020024.
Xiaofeng ZOU, Wangdong YANG, Xuecheng RONG, et al. A survey of dataflow programming models and tools for big data processing[J]. Big Data Research, 2020, 6(3): 2020024-1. DOI: 10.11959/j.issn.2096-0271.2020024.
利用大数据计算平台对大量的静态数据进行数据挖掘和智能分析助推了大数据和人工智能应用的落地。在面临互联网、物联网产生的日益庞大的实时动态数据的处理需求时,数据流计算被逐步引入目前的一些大数据处理平台中。针对数据流的编程模型,比较了传统软件工程的面向数据流的分析和设计方法与目前针对大数据处理平台的数据流编程模型提供的结构定义和模型参考,分析了两者的差异和不足,总结了数据流编程模型的主要特征和关键要素。分析了目前数据流编程的主要方式以及与主流编程工具的结合,针对大数据处理的数据流计算业务需求,给出了可视化数据流编程工具的基本框架和编程模式。
The application of big data and artificial intelligence is promoted by data mining and intelligent analysis of a large number of static data using big data computing platform.In the face of the growing demand for real-time dynamic data processing generated by the Internet of things
dataflow computing has been gradually introduced into some big data processing platforms.Aiming at the programming model of data flow
the traditional software engineering design method for dataflow analysis and the structure definition and model reference provided by the current dataflow programming model for big data processing platform was compared
the differences and shortcomings were analyzed
and the main features and key elements of the dataflow programming model were summarized.The main methods of dataflow programming and the combination with the mainstream programming tools were analyzed
and the basic framework and programming mode of visual dataflow programming tools were presented according to the dataflow computing business requirements of big data processing.
VAVILAPALLI V K , MURTHY A C , DOUGLAS C , et al . Apache Hadoop YARN:yet another resource negotiator [C ] // The 4th Annual Symposium on Cloud Computing . New York:ACM Press , 2013 : 1 - 16 .
IQBAL M H , SOOMRO T R . Big data analysis:Apache Storm perspective [J ] . International Journal of Computer Trends and Technology , 2015 , 19 ( 1 ): 9 - 14 .
CARBONE P , KATSIFODIMOS A , EWEN S , et al . Apache Flink:stream and batch processing in a single engine [J ] . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , 2015 , 36 ( 4 ): 28 - 38 .
ZAHARIA M , XIN R S , WENDELL P , et al . Apache Spark:a unified engine for big data processing [J ] . Communications of the ACM , 2016 , 59 ( 11 ): 56 - 65 .
MCDERMID J . Software engineering:apractitioner’s approach [J ] . Software Engineering Journal , 1995 , 10 ( 6 ):266.
JILANIC A A A , NADEEM A , KIM T H , et al . Formal representations of the data flow diagram:a survey [C ] // The 2008 Advanced Software Engineering and Its Applications . Piscataway:IEEE Press , 2008 : 153 - 158 .
REPA V . Object-oriented analysis with data flow diagram [M ] . Heidelbery : SpringerPress , 2013 .
DENNIS J B , FOSSEEN J B , LINDERMAN J P . Data flow schemas [C ] // International Symposium on Theoretical Programming . Heidelbery:Springer , 1972 : 187 - 216 .
KRAMER R , GUPTA R , SOFFA M L . The combining DAG:a technique for parallel data flow analysis [J ] . IEEE Transactions on Parallel and Distributed Systems , 1994 , 5 ( 8 ): 805 - 813 .
DENNIS J B . Data flow supercomputers [J ] . Computer , 1980 ( 11 ): 48 - 56 .
MILUTINOVIC V , SALOM J , TRIFUNOVIC N , et al . Guide to dataflow supercomputing [M ] . Heidelbery : SpringerPress , 2015 .
LI A , BRAAK G J , CORPORAAL H , et al . Fine-grained synchronizations and dataflow programming on GPUs [C ] // The 29th ACM on International Conference on Supercomputing . New York:ACM Press , 2015 : 109 - 118 .
HALBWACHS N , CASPI P , RAYMOND P , et al . The synchronous data flow programming language LUSTRE [J ] . Proceedings of the IEEE , 1991 , 79 ( 9 ): 1305 - 1320 .
苏志超 . 神威•太湖之光上数据流编程模型的设计与实现 [D ] . 合肥:中国科学技术大学 , 2018 .
SU Z C . Design and implement of a dataflow programming model on sunway taihulight [D ] . Hefei:University of Science and Technology of China , 2018 .
杨瑞瑞 . 面向多核 CPU/众核 GPU 异构集群的数据流编程模型研究 [D ] . 武汉:华中科技大学 , 2017 .
YANG R R . A research of dataflow programming model oriented multi-core CPU/many-core GPU heterogeneous cluster [D ] . Wuhan:Huazhong University of Science and Technology , 2017 .
张维维 , 魏海涛 , 于俊清 , 等 . COStream:一种面向数据流的编程语言和编译器实现 [J ] . 计算机学报 , 2013 , 36 ( 10 ): 1993 - 2006 .
ZHANG W W , WEI H T , YU J Q , et al . COStream:a language for dataflow application and compiler [J ] . Chinese Journal of Computers , 2013 , 36 ( 10 ): 1993 - 2006 .
AKIDAU T , SCHMIDT E , WHITTLE S , et al . The dataflow model:a practical approach to balancing correctness,latency,and cost in massive-scale,unbounded,out-of-order data processing [J ] . Proceedings of the VLDB Endowment , 2015 , 8 ( 12 ): 1792 - 1803 .
LAUDERDALE C , KHAN R . Towards a codelet-based runtime for exascale computing:position paper [C ] // The 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era . New York:ACM Press , 2012 : 21 - 26 .
THIES W , KARCZMAREK M , AMARASINGHE S . StreamIt:a language for streaming applications [C ] // The 11th International Conference on Compiler Construction . Heidelberg:Springer , 2002 : 179 - 196 .
ABADI M , BARHAM P , CHEN J , et al . TensorFlow:a system for large-scale machine learning [C ] // The 12th USENIX Symposium on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2016 : 265 - 283 .
SUETTERLEIN J , ZUCKERMAN , STEPHANE , et al . An implementation of the Codelet model [M ] . Heidelberg : SpringerPress , 2013 .
杨秋吉 , 于俊清 , 莫斌生 , 等 . 面向 Storm 的数据流编程模型与编译优化方法研究 [J ] . 计算机工程与科学 , 2016 , 38 ( 12 ): 2409 - 2418 .
YANG Q J , YU J Q , MO B S , et al . A data flow programming model and compiler optimization for Storm [J ] . Computer Engineering and Science , 2016 , 38 ( 12 ): 2409 - 2418 .
BLUME P A . The LabVIEW style book [M ] . Upper Saddle River : Prentice HallPress , 2007 .
杨燕 . COStream 数据流程序图形编辑器的设计与实现 [D ] . 武汉:华中科技大学 , 2016 .
YANG Y . The design and implementation of COStream graph editor [D ] . Wuhan:Huazhong University of Science and Technology , 2016 .
ALVES T A O , GOLDSTEIN B F , FRANCA F M G , et al . A minimalistic dataflow programming library for Python [J ] . Operations Research Letters , 2014 , 5 ( 1 ): 51 - 54 .
LIN Y A , . Map-reduce for machine learning on multicore [C ] // The 20th Annual Conference on Neural Information Processing Systems . Massachusetts:MIT Press , 2006 : 281 - 288 .
TEAM A H B . Apache HBase reference guide [M ] . California:[s.n.] . 2016 .
HUAI Y , CHAUHAN A , GATES A , et al . Major technical advancements in Apache Hive [C ] // The 2014 ACM SIGMOD International Conference on Management of Data . New York:ACM Press , 2014 : 1235 - 1246 .
GARG N . Apache Kafka [M ] . England : Packt Publishing LtdPress , 2013 .
0
浏览量
1016
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621