[ "毕倪飞(1996- ),男,华东师范大学数据科学与工程学院硕士生,主要研究方向为异构分布式系统中的查询优化" ]
[ "丁光耀(1996- ),男,华东师范大学数据科学与工程学院博士生,主要研究方向为并行与分布式系统" ]
[ "陈启航(1996- ),男,华东师范大学数据科学与工程学院硕士生,主要研究方向为异构分布式计算中的查询优化" ]
[ "徐辰(1988- ),男,华东师范大学数据科学与工程学院副教授、硕士生导师,主要研究方向为大规模分布式数据管理" ]
[ "周傲英(1965- ),男,博士,华东师范大学副校长、“智能+”研究院院长、数据科学与工程学院教授。现任第七届国务院学位委员会学科评议组成员,中国计算机学会会士,上海市计算机学会副理事长,《计算机学报》《大数据》期刊副主编。曾入选“长江学者计划”特聘教授,曾获国家杰出青年基金项目资助,主要研究方向为数据库、数据管理、数据驱动的计算教育学,以及教育科技(EduTech)、物流科技(LogTech)等基于数据的应用科技" ]
网络首发:2020-05,
纸质出版:2020-05-15
移动端阅览
毕倪飞, 丁光耀, 陈启航, 等. 数据流计算模型及其在大数据处理中的应用[J]. 大数据, 2020,6(3):2020025-1.
Nifei BI, Guangyao DING, Qihang CHEN, et al. Dataflow model and its applications in big data processing[J]. Big Data Research, 2020, 6(3): 2020025-1.
毕倪飞, 丁光耀, 陈启航, 等. 数据流计算模型及其在大数据处理中的应用[J]. 大数据, 2020,6(3):2020025-1. DOI: 10.11959/j.issn.2096-0271.2020025.
Nifei BI, Guangyao DING, Qihang CHEN, et al. Dataflow model and its applications in big data processing[J]. Big Data Research, 2020, 6(3): 2020025-1. DOI: 10.11959/j.issn.2096-0271.2020025.
如今无界、乱序的大规模数据集越来越普遍,并且消费者对这些数据集的处理需求日益复杂,如时间语义、窗口以及处理时延等。针对在无界、乱序的大规模数据集上演进的数据处理需求,探讨了大数据处理中的数据流计算模型。一方面,从执行引擎层面分析了大数据处理中的数据流计算模型所体现的数据流图;另一方面,从统一编程层面分析了大数据处理中的数据流计算模型所体现的数据流编程模型。在此基础上,进一步结合Spark批处理引擎和Flink流计算引擎等多个执行引擎,对比分析了数据流图和数据流编程模型在2类执行引擎中的具体实现。
Unbounded
unordered and large scale datasets are increasingly common in recent years.Meanwhile
the processing requirements from data consumers are becoming more and more sophisticated
such as event time
window and latency.In order to deal with the evolved processing requirements on these unbounded
unordered and large scale datasets
the dataflow model in big data processing was introduced.On one hand
the dataflow graph of the dataflow model in big data processing was analyzed from the level of execution engine.On other hand
the dataflow programming model of the dataflow model in big data processing was analyzed from the level of unified programming.Furthermore
the different implementations of dataflow graph and dataflow programming model in multiple execution engines were analyzed
including Spark
a batch processing engine
and Flink
a stream processing engine.
VEEN A H . Dataflow machine architecture [J ] . ACM Computing Surveys , 1986 , 18 ( 4 ): 365 - 396 .
SRINI V P . An architectural comparison of dataflow systems [J ] . IEEE Computer , 1986 , 19 ( 3 ): 68 - 88 .
DENNIS J B , MISUNAS D P . A preliminary architecture for a basic dataflow processor [C ] // The 2nd Annual Symposium on Computer Architecture . New York:ACM Press , 1975 : 126 - 132 .
RUMBAUGH J . A data flow multiprocessor [J ] . IEEE Transactions on Computers , 1977 , 26 ( 2 ): 138 - 146 .
DAVIS A L , . A data flow evaluation system based on the concept of recursive locality [C ] // The 1979 International Workshop on Managing Requirements Knowledge . Piscataway:IEEE Press , 1979 : 1079 - 1086 .
MCSHERRY F , MURRAY D G , ISAACS R , et al . Differential dataflow [C ] // The 6th Biennial Conference on Innovative Data Systems Research.[S.l.:s.n . ] , 2013 .
MURRAY D G , MCSHERRY F , ISAACS R , et al . Naiad:a timely dataflow system [C ] // The 24th ACM Symposium on Operating Systems Principles . New York:ACM Press , 2013 : 439 - 455 .
ABADI M , BARHAM P , CHEN J , et al . TensorFlow:a system for large-scale machine learning [C ] // The 12th USENIX Symposium on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2016 : 265 - 283 .
BONNA R , LOUBACH D S , UNGUREANU G , et al . Modeling and simulation of dynamic applications using scenario-aware dataflow [J ] . ACM Transactions on Design Automation of Electronic Systems , 2019 , 24 ( 5 ): 1 - 29 .
DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters [J ] . Communications of the ACM , 2008 , 51 ( 1 ): 107 - 113 .
ZAHAR IA M , CHOWDHURY M , DAS T , et al . Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing [C ] // The 9th USENIX Conference on Networked Systems Design and Implementation . Berkeley:USENIX Association , 2012 :2.
ZAHAR IA M , CHOWDHURY M , FRANKLIN M J , et al . Spark:cluster computing with working sets [C ] // The 2nd USENIX Workshop on Hot Topics in Cloud Computing . Berkeley:USENIX Association , 2010 :10.
ZAHAR IA M , DAS T , LI H , et al . Discretized streams:fault-tolerant streaming computation at scale [C ] // The 24th Symposium on Operating Systems Principles . New York:ACM Press , 2013 : 423 - 438 .
ARMBR UST M , DAS T , TORRES J , et al . Structured streaming:a declarative API for real-time applications in Apache Spark [C ] // The 2018 International Conference on Management of Data . New York:ACM Press , 2018 : 601 - 613 .
ISARD M , BUDIU M , YU Y , et al . Dryad:distributed data-parallel programs from sequential building blocks [C ] // The 2007 EuroSys Conference . New York:ACM Press , 2007 : 59 - 72 .
TOSHN IWAL A , TANEJA S , SHUKLA A , et al . Storm@twitter [C ] // The 2014 ACM SIGMOD International Conference on Management of Data . New York:ACM Press , 2014 : 147 - 156 .
A KIDA U T , BALIKOV A , BEKIROĞLU K , et al . MillWheel:fault-tolerant stream processing at internet scale [J ] . Proceedings of the VLDB Endowment , 2013 , 6 ( 11 ): 1033 - 1044 .
NOGHABI S A , PARAMASIVAM K , PAN Y , et al . Samza:stateful scalable stream processing at LinkedIn [J ] . Proceedings of the VLDB Endowment , 2017 , 10 ( 12 ): 1634 - 1645 .
PATHIRAGE M , HYDE J , PAN Y , et al . SamzaSQL:scalable fast data management with streaming SQL [C ] // The 2016 IEEE International Parallel and Distributed Processing Symposium Workshops . Piscataway:IEEE Press , 2016 : 1627 - 1636 .
CARBO NE P , KATSIFODIMOS A , EWEN S , et al . Apache Flink:stream and batch processing in a single engine [J ] . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , 2015 , 38 ( 4 ): 28 - 38 .
AKIDA U T , BRADSHAW R , CHAMBERS C , et al . The dataflow model:a practical approach to balancing correctness,latency,and cost in massive-scale,unbounded,out-of-order data processing [J ] . Proceedings of the VLDB Endowment , 2015 , 8 ( 12 ): 1792 - 1803 .
0
浏览量
545
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621