1. 领域操作系统教育部工程研究中心,上海 200240
2. 上海交通大学软件学院并行与分布式系统研究所,上海 200240
[ "吴明瑜(1993- ),男,上海交通大学软件学院博士生,主要研究方向为语言虚拟机和非易失性内存" ]
[ "陈海波(1982- ),男,博士,上海交通大学教授、并行与分布式系统研究所所长,领域操作系统教育部工程研究中心主任,国家杰出青年基金获得者,国际计算机学会(ACM)杰出科学家,中国计算机学会(CCF)杰出会员与杰出演讲者,主要研究方向为操作系统和系统安全。曾获教育部技术发明奖一等奖(第一完成人),全国优秀博士学位论文奖、CCF青年科学家奖。目前担任ACM SIGOPS ChinaSys主席、CCF系统软件专业委员会副主任、Communications of the ACM中国首位编委与Special Sections领域共同主席、ACM Transactions on Storage编委、《大数据》期刊编委。曾任ACM SOSP 2017大会共同主席、ACM CCS 2018系统安全领域主席、ACM SIGSAC奖励委员会委员。研究工作获得华为技术有限公司最高个人贡献奖、Google Faculty Research Award、IBM X10 Innovation Award、NetApp Faculty Fellowship等企业奖励" ]
[ "臧斌宇(1962- ),男,博士,上海交通大学教授、软件学院院长。2011年全国优秀博士学位论文指导教师,2015年“挑战杯”全国竞赛特等奖指导教师。兼任国务院学位委员会软件工程学科评议组成员、教育部高等学校软件工程专业教学指导委员会副秘书长、全国工程教育专业认证专家委员会计算机类专业认证分委员会委员、CCF杰出会员、国家示范性软件学院联盟副理事长。主要从事系统软件方向的研究,致力于计算机核心课程的教学改革。近年来在SOSP、USENIX ATC、Eurosys、ASPLOS、ISCA、HPCA、PPoPP等国际会议上发表论文20余篇。主持多项国家级科研项目" ]
网络首发:2020-07,
纸质出版:2020-07-15
移动端阅览
吴明瑜, 陈海波, 臧斌宇. 大数据场景中语言虚拟机的应用和挑战[J]. 大数据, 2020,6(4):2020035-1.
Mingyu WU, Haibo CHEN, Binyu ZANG. Applications and challenges of language virtual machines in big data[J]. Big Data Research, 2020, 6(4): 2020035-1.
吴明瑜, 陈海波, 臧斌宇. 大数据场景中语言虚拟机的应用和挑战[J]. 大数据, 2020,6(4):2020035-1. DOI: 10.11959/j.issn.2096-0271.2020035.
Mingyu WU, Haibo CHEN, Binyu ZANG. Applications and challenges of language virtual machines in big data[J]. Big Data Research, 2020, 6(4): 2020035-1. DOI: 10.11959/j.issn.2096-0271.2020035.
语言虚拟机为大数据应用提供了与平台无关的执行环境,简化了应用的开发和部署,因此在大数据场景中得到了较广泛的应用。主要分析了两种主流语言虚拟机——JVM和CLR在大数据场景中的应用,并阐述了使用语言虚拟机面临的4个挑战:初始化及“热身”开销、垃圾回收暂停、异构内存支持、数据格式转换。之后,分别针对4个挑战讨论了现有的解决方案,并分析了这些方案的不足之处及未来可能的优化方向。
Language virtual machines provide a platform-independent execution environment for big-data applications and simplify their development and deployment phases
so they are widely used in the big-data scenario.The applications of two different kinds of mainstream language virtual machines:JVM and CLR
were analyzed
and four challenges when adopting language virtual machines:initialization and warm-up overhead
garbage collection pauses
heterogeneous memory support
and data layout transformation
were summarized.Afterward
existing approaches to the challenges were discussed and their shortcomings and possible optimizations in the future were analyzed.
WHITE T . Hadoop:the definitive guide [M ] . Sebastopol : O’Reilly Media,Inc.Press , 2012 :647.
ZAHARIA M , XIN R , WENDELL P , et al . Apache Spark:a unified engine for big data processing [J ] . Communications of the ACM , 2016 , 59 ( 11 ): 56 - 65 .
CARBONE P , KATSIFODIMOS A , EWEN S , et al . Apache Flink:stream and batch processing in a single engine [J ] . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , 2015 , 36 ( 4 ): 28 - 38 .
YU Y , ISARD M , FETTERLY D , et al . DryadLINQ:a system for general-purpose distributed data-parallel computing using a high-level language [C ] // The 8th USENIX Symposium on Operating Systems Designand Implementation . Berkeley:USENIX Association , 2008 : 383 - 400 .
LINDHOLM T , YELLIN F , BRACHA G , et al . The Java virtual machine specification [M ] . London : Pearson EducationPress , 2014 :584.
KENNEDY A , SYME D.Design and implementation of generics for the . NET common language runtime [C ] // The ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation . New York:ACM Press , 2001 : 1 - 12 .
DU D Y . Apache Hive essentials [M ] . Birmingham : Packt Publishing LtdPress , 2015 :191.
BORKAR V , CAREY M , GROVER R , et al . Hyracks:a flexible and extensible foundation for data-intensive computing [C ] // The IEEE 27th International Conference on Data Engineering . Piscataway:IEEE Press , 2011 : 1151 - 1162 .
IQBAL M H , SOOMRO T R . Big data analysis:Apache Storm perspective [J ] . International Journal of Computer Trends and Technology , 2015 , 19 ( 1 ): 9 - 14 .
ISARD M , BUDIU M , YU Y , et al . Dryad:distributed data-parallel programs from sequential building blocks [C ] // The 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems . New York:ACM Press , 2014 : 59 - 72 .
CHAIKEN R , JENKINS B , LARSON P , et al . SCOPE:easy and efficient parallel processing of massive data sets [J ] . Proceedings of the VLDB Endowment , 2008 , 1 ( 2 ): 1265 - 1276 .
PIALORSI P , RUSSO M . Introducing microsoft® LINQ [M ] . Redmond : Microsoft PressPress , 2007 :282.
LION D , CHIU A , SUN H , et al . Don’t get caught in the cold,warm-up your JVM:understand and eliminate JVM warm-up overhead in data-parallel systems [C ] // The 12th USENIX Symposium on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2016 : 383 - 400 .
UNGAR D , . Generation scavenging:a nondisruptive high performance storage reclamation algorithm [C ] // The 1st ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments . New York:ACM Press , 1984 : 157 - 167 .
NGUYEN K , FANG L , XU G Q , et al . Yak:a high-performance big-data-friendly garbage collector [C ] // The 12th USENIX Symposium on Operating Systems Design and Implementation . Berkeley:USENIX Association , 2016 : 349 - 365 .
OpenJDK . JEP 316:heap allocation on alternative memory devices [Z ] . 2016 .
DRABAS T , LEE D . Learning PySpark [M ] . Birmingham : Packt Publishing LtdPress , 2017 :250.
VENKATARAMAN S , YANG Z H , LIU D , et al . SparkR:scaling R programs with Spark [C ] // The 2016 International Conference on Management of Data . New York:ACM Press , 2016 : 1099 - 1104 .
NGUYEN K , FANG L , NAVASCA C , et al . Skyway:connecting managed heaps in distributed big data systems [C ] // The 23rd International Conference on Architectural Support for Programming Languages and Operating Systems . New York:ACM Press , 2018 : 56 - 69 .
OpenJDK . JEP 310:application class-data sharing [Z ] . 2017 .
PIRVU M . Optimize JVM start-up with Eclipse OpenJ9 [R ] . 2018 .
WANG K A , HO R , WU P . Replayable execution optimized for page sharing for a managed runtime environment [C ] // The 14th EuroSys Conference 2019 . New York:ACM Press , 2019 : 1 - 16 .
BRUNO R , OLIVEIRA L P , FERREIRA P . NG2C:pretenuring garbage collection with dynamic generations for HotSpot big data applications [C ] // The 2017 ACM SIGPLAN International Symposium on Memory Management . New York:ACM Press , 2017 : 2 - 13 .
NGUYEN K , WANG K , BU Y Y , et al . FAÇADE:a compiler and runtime for (almost) object-bounded big data applications [C ] // The 20th International Conference on Architectural Support for Programming Languages and Operating Systems . New York:ACM Press , 2015 : 675 - 690 .
XIN R , ROSEN J . Project Tungsten:bringing Apache Spark closer to BareMetal [R ] . 2015 .
SHI X H , KE Z X , ZHOU Y L , et al . Deca:a garbage collection optimizer for in-memory data processing [J ] . ACM Transactions on Computer Systems , 2019 , 36 ( 1 ): 1 - 47 .
FANG L , NGUYEN K , XU G Q , et al . Interruptible tasks:treating memory pressure as interrupts for highly scalable data-parallel programs [C ] // The 25th Symposium on Operating Systems Principles . New York:ACM Press , 2015 : 394 - 409 .
TAN J , CHIN A , HU Z Z , et al . DynMR:dynamic MapReduce with reducetask interleaving and maptask backfilling [C ] // The 9th European Conference on Computer Systems . New York:ACM Press , 2014 : 1 - 14 .
YANG J , KIM J , HOSEINZADEH M , et al . An empirical guide to the behavior and use of scalable persistent memory [C ] // The 18th USENIX Conference on File and Storage Technologies . Berkeley:USENIX Association , 2020 : 169 - 182 .
WANG C X , CUI H M , CAO T , et al . Panthera:holistic memory management for big data processing over hybrid memories [C ] // The 40th ACM SIGPLAN Conference on Programming Language Design and Implementation . New York:ACM Press , 2019 : 347 - 362 .
WU M Y , CHEN H B , ZHU H , et al . GCPersist:an efficient GC-assisted lazy persistency framework for resilient Java applications on NVM [C ] // The 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments . New York:ACM Press , 2020 : 1 - 14 .
XU G Q , VEANES M , BARNETT M , et al . Niijima:sound and automated computation consolidation for efficient multilingual data-parallel pipelines [C ] // The 27th ACM Symposium on Operating Systems Principles . New York:ACM Press , 2019 : 306 - 321 .
NAVASCA C , CAI C , NGUYEN K , et al . Gerenuk:thin computation over big native data using speculative program transformation [C ] // The 27th ACM Symposium on Operating Systems Principles . New York:ACM Press , 2019 : 538 - 553 .
KRYO . Java binary serialization and cloning:fast,efficient,automatic [R ] . 2020 .
COLFER . The Colfer serializer [R ] . 2017 .
OpenJDK.Valhalla [R ] . 2020 .
GraalVM.Run programs faster anywhere [R ] . 2020 .
0
浏览量
710
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621