[ "夏正勋(1979- ),男,星环信息科技(上海)有限公司高级研究员,主要研究方向为大数据、数据库、人工智能、流媒体处理技术等" ]
[ "罗圣美(1971- ),男,博士,星环信息科技(上海)有限公司大数据研究院院长,主要研究方向为大数据、并行计算、云存储、人工智能等" ]
[ "孙元浩(1976- ),男,星环信息科技(上海)有限公司创始人,从2009年开始研究大数据技术,2013年创立公司,并开始新一代大数据技术的自主研发" ]
[ "唐剑飞(1986- ),男,星环信息科技(上海)有限公司大数据技术标准研究员,主要研究方向为大数据、数据库、图计算等" ]
[ "张燕(1985- ),女,星环信息科技(上海)有限公司大数据技术研究员,主要研究方向为大数据、人工智能等" ]
网络首发:2020-07,
纸质出版:2020-07-15
移动端阅览
夏正勋, 罗圣美, 孙元浩, 等. 大规模异构数据并行处理系统的设计、实现与实践[J]. 大数据, 2020,6(4):2020030-1.
Zhengxun XIA, Shengmei LUO, Yuanhao SUN, et al. Design,implementation and practice of parallel processing system for a large-scale heterogeneous data[J]. Big Data Research, 2020, 6(4): 2020030-1.
夏正勋, 罗圣美, 孙元浩, 等. 大规模异构数据并行处理系统的设计、实现与实践[J]. 大数据, 2020,6(4):2020030-1. DOI: 10.11959/j.issn.2096-0271.2020030.
Zhengxun XIA, Shengmei LUO, Yuanhao SUN, et al. Design,implementation and practice of parallel processing system for a large-scale heterogeneous data[J]. Big Data Research, 2020, 6(4): 2020030-1. DOI: 10.11959/j.issn.2096-0271.2020030.
随着互联网和物联网应用的快速发展,数据处理模式从结构化逐渐扩展到结构化、半结构化和非结构化混合的异构数据处理模式。设计了一种大规模异构数据并行处理系统,在统一的平台功能视图基础上,采用统一的资源管理框架,实现对结构化、JSON/XML、图数据、文档数据等多种异构数据进行存储和查询,采用统一的开发语言,实现跨数据类型和数据存储引擎的并行计算,满足多业务应用开发的需要,并通过实际的商业部署,验证了系统的可行性。
With the rapid development of Internet and IoT applications
data processing has gradually expanded from structured to structured
semi-structured and unstructured hybrid heterogeneous data processing mode.A large-scale heterogeneous data parallel processing system was designed.Based on the functional view of a unified platform
the unified resource management framework was adopted to store and query a variety of heterogeneous data
including structured
JSON/XML
graph data
document data
etc.Adopting a unified database language
the parallel computing across data types and database engines was realized
and the needs of multi business application development were met.The feasibility of the system is verified by standard evaluation environment and commercial deployment.
DAVOUDIAN A , LIU C , LIU M C . A survey on NoSQL stores [J ] . ACM Computing Surveys , 2018 , 51 ( 2 ): 1 - 43 .
SHARMA S , SHANDILYA R , PATNAIK S , et al . Leading NoSQL models for handling big data:a brief review [J ] . International Journal of Business Information Systems , 2016 , 22 ( 1 ): 1 - 25 .
陈民峰 . 国产数据库管理系统应用的研究与探讨 [J ] . 计算机应用 , 2003 , 23 ( z2 ): 39 - 42 .
CHEN M F . Research and discussion on the application of domestic database management system [J ] . Journal of Computer Applications , 2003 , 23 ( z2 ): 39 - 42 .
IDC . Data age 2025,the digitization of the world from edge to core [R ] . 2018 .
STONEBRAKER M , CETINTEMEL U . One size fits all:an idea whose time has come and gone [C ] // The 21st International Conference on Data Engineering . Piscataway:IEEE Press , 2005 .
CODD E F . A relational model of data for large shared data banks [J ] . Communications of the ACM , 1970 , 13 ( 6 ): 377 - 387 .
DEWITT D J , GHANDEHARIZADEH S , SCHNEIDER D A , et al . The Gamma database machine project [J ] . IEEE Transactions on Knowledge & Data Engineering , 1990 , 2 ( 1 ): 44 - 62 .
DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters [J ] . Communications of the ACM , 2008 , 51 ( 1 ): 107 - 113 .
KORNACKER M , BEHM A , BITTORF V , et al . Impala:a modern,open-source SQL engine for Hadoop [C ] // Conference on Innovative Data Systems Research.[S.l.:s.n . ] , 2015 .
ZAHARIA M , CHOWDHURY M , DAS T , et al . Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing [C ] // The 9th USENIX Conference on Networked Systems Design and Implementation . Berkeley:USENIX Association , 2012 .
ALEXANDROV A , BERGMANN R , EWEN S , et al . The stratosphere platform for big data analytics [J ] . VLDB Journal , 2014 , 23 ( 6 ): 939 - 964 .
DONALD F , ADAM R . Hype cycle for data management [Z ] . 2019 .
孙元浩 . 大数据进入3.0时代 [J ] . 信息技术与标准化 , 2019 ( 5 ): 94 - 104 .
SUN Y H . Big data enters 3.0 era [J ] . Information Technology & Standardization , 2019 ( 5 ): 94 - 104 .
刘汪根 , 孙元浩 . 大数据3.0——后Hadoop时代大数据的核心技术 [J ] . 数据与计算发展前沿 , 2019 , 1 ( 1 ): 94 - 104 .
LIU W G , SUN Y H . Big data 3.0,the key technologies of big data in post-Hadoop era [J ] . Frontiers of Data and Computing , 2019 , 1 ( 1 ): 94 - 104 .
0
浏览量
821
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621