CPU-MIC异构并行架构下基于大规模频繁子图挖掘的药物发现算法

彭绍亮; 牛琦; 李肯立; 邹权

doi:10.11959/j.issn.2096-0271.2019016

您当前的位置：

首页 >

文章列表页 >

CPU-MIC异构并行架构下基于大规模频繁子图挖掘的药物发现算法

研究 | 更新时间：2024-06-03

- CPU-MIC异构并行架构下基于大规模频繁子图挖掘的药物发现算法
- A scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
- 大数据 2019年5卷第2期页码：2019016-1
- 作者机构：
  
  1. 湖南大学信息科学与工程学院，湖南长沙 410082
  2. 电子科技大学基础与前沿研究院，四川成都 610054
- 作者简介：
  
  [ "彭绍亮（1979- ），男，博士，湖南大学信息科学与工程学院教授，国家超级计算长沙中心（湖南大学）副主任。长期从事高性能计算、大数据、生物信息、人工智能、区块链等技术研究工作，担任国防科技大学“天河”生命科学方向负责人，深圳华大基因研究院“特聘教授”，湖南大学“岳麓学者”三级教授。发表学术论文上百篇。参与“天河”系列超级计算机应用软件研发工作，参与国家“973”项目、“863”项目、军队重大型号项目等13项，获军队科技进步奖一等奖1项，中国计算机学会（CCF）科学技术奖自然科学二等奖，2016年荣立三等功。" ]
  [ "牛琦（1995- ），男，湖南大学信息科学与工程学院硕士生，主要研究方向为计算生物学。" ]
  [ "李肯立（1971- ），男，湖南大学信息科学与工程学院院长，国家超级计算长沙中心主任，教育部“长江学者”特聘教授、国家杰出青年科学基金获得者、国家“万人计划”科技创新领军人才。学术兼职有教育部“高效能计算学科创新引智基地”负责人、数据分析湖南省工程技术研究中心主任。担任国家超级计算创新联盟副理事长、新一代人工智能产业技术创新战略联盟专家委员会委员、IEEE高级会员、CCF杰出会员、CCF高性能计算专业委员会常务委员、湖南省计算机学会秘书长等。主要研究方向为并行分布式处理、超级计算与云计算、面向大数据和人工智能的高效能计算等。" ]
  [ "邹权（1982- ）男，电子科技大学基础与前沿研究院教授，IEEE会员，ACM会员，CCF高级会员，中国人工智能学会会员（粗糙集与软计算专业委员会委员、生物信息学与人工生命专业委员会委员、机器学习专业委员会通讯委员），中国运筹学会会员，中国自动化学会会员。主要研究方向为生物信息学、机器学习和并行计算，现主要研究基于并行计算方法的下一代测序数据的蛋白质分类、基因组装配、注释和功能分析。" ]
- 基金信息：
  
  国家重点研发计划基金资助项目;National Key Research and Development Program of China(2017YFB0202602);国家重点研发计划基金资助项目;National Key Research and Development Program of China(2018YFC0910405);国家重点研发计划基金资助项目;The National Key Research and Development Program of China(2017YFC1311003);国家重点研发计划基金资助项目;The National Key Research and Development Program of China(2016YFC1302500);国家重点研发计划基金资助项目;The National Key Research and Development Program of China(2016YFB0200400);国家重点研发计划基金资助项目;The National Key Research and Development Program of China(2017YFB0202104);国家自然科学基金资助项目;The National Natural Science Foundation of China(61772543);国家自然科学基金资助项目;The National Natural Science Foundation of China(U1435222);国家自然科学基金资助项目;The National Natural Science Foundation of China(61625202);国家自然科学基金资助项目;The National Natural Science Foundation of China(61272056)
- DOI：10.11959/j.issn.2096-0271.2019016
  中图分类号： TP31
- 网络首发：2019-03，
  
  纸质出版：2019-03-15
- 稿件说明：
移动端阅览
彭绍亮, 牛琦, 李肯立, 等. CPU-MIC异构并行架构下基于大规模频繁子图挖掘的药物发现算法[J]. 大数据, 2019,5(2):2019016-1.

Shaoliang PENG, Qi NIU, Kenli LI, et al. A scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining[J]. Big Data Research, 2019, 5(2): 2019016-1.
彭绍亮, 牛琦, 李肯立, 等. CPU-MIC异构并行架构下基于大规模频繁子图挖掘的药物发现算法[J]. 大数据, 2019,5(2):2019016-1. DOI： 10.11959/j.issn.2096-0271.2019016.

Shaoliang PENG, Qi NIU, Kenli LI, et al. A scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining[J]. Big Data Research, 2019, 5(2): 2019016-1. DOI： 10.11959/j.issn.2096-0271.2019016.

摘要

频繁子图挖掘是许多实际应用领域中需要解决的重要问题，由于计算密集性、挖掘的图集及其结果容量大，现有的频繁子图挖掘方案无法满足时间需求，其处理效率是目前面临的主要挑战。原创性地提出了并行加速的频繁子图挖掘工具cmFSM。cmFSM主要在3个层次上进行并行优化：单节点上的细粒度OpenMP并行化、多节点多进程并行化和CPU-MIC协作并行化。在单节点上cmFSM的处理速度比基于CPU的最佳算法快一倍，在多节点方案中cmFSM提供可扩展性。结果表明，即使只使用一些并行计算资源，cmFSM也明显优于现有的最先进的算法。这充分表明提出的工具在生物信息学领域的有效性。

Abstract

Frequent subgraph mining is an important issue to be solved in many practical fields.Due to the computational intensiveness

the mining of the atlas and the large capacity of the results

the existing solutions can not meet the time requirements

and its efficiency is currently the main challenge.The frequent subgraph mining tool cmFSM for parallel acceleration was originally proposed.cmFSM performs parallel optimization on three levels:fine-grained OpenMP parallelization on a single node

multi-node multi-process parallelization and CPU-MIC collaborative parallelization.cmFSM is twice as fast as the best CPU-based algorithm on a single node and provides scalability in a multi-node approach.In the future

we will continue to improve the scalability of multiple solutions.The results show that even with only a few parallel computing resources

cmFSM is significantly better than the most advanced algorithms available.This fully demonstrates the effectiveness of the proposed tool in the field of bioinformatics.

关键词

Keywords

references

LIN W . Efficient techniques for subgraph mining and query processing [D ] . Singapore:Nanyang Technological University , 2015 .

HUAN J , WANG W , PRINS J , et al . SPIN:mining maximal frequent subgraphs from graph databases [C ] // The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,August 22-25,2004,Seattle,USA . New York:ACM Press , 2004 : 581 - 586 .

JIANG X , XIONG H , WANG C , et al . Mining globally distributed frequent subgraphs in a single labeled graph [J ] . Data and Knowledge Engineering , 2009 , 68 ( 10 ): 1034 - 1058 .

KURAMOCHI M , KARYPIS G . Finding frequent patterns in a large sparse graph [J ] . Data Mining ＆ Knowledge Discovery , 2005 , 11 ( 3 ): 243 - 271 .

KANG U , TSOURAKAKIS CE , FALOUTSOS C . PEGASUS:mining peta-scale graphs [J ] . Knowledge ＆ Information Systems , 2011 , 27 ( 2 ): 303 - 325 .

REINHARDT S , KARYPIS G . A multi-level parallel implementation of a program for finding frequent patterns in a large sparse graph [C ] // 2007 IEEE International Parallel and Distributed Processing Symposium,March 26-30,2007,Rome,Italy . Piscataway:IEEE Press , 2007 : 1 - 8 .

WU B , BAI Y L . An efficient distributed subgraph mining algorithm in extreme largegraphs [C ] // The 2010 International Conference on Artificial Intelligence and Computational Intelligence,October 23-24,2010,Sanya,China . Heidelberg:Springer , 2010 : 107 - 115 .

YAN Y , DONG Y , HE X , et al . FSMBUS:a frequent subgraph mining algorithm in single large-scale graph using spark [J ] . Journal of Computer Research and Development , 2015 , 52 ( 8 ): 1768 - 1783 .

LIN W , XIAO X , XIE X , et al . Network motif discovery:a GPU approach [C ] // IEEE 31st International Conference on Data Engineering,April 13-17,2015,Seoul,Korea . Piscataway:IEEE Press , 2015 : 831 - 842 .

HILL S , SRICHANDAN B , SUNDERRAMAN R . An iterative MapReduce approach to frequent subgraph mining in biological datasets [C ] // The ACM Conference on Bioinformatics,Computational Biology and Biomedicine,October 7-10,2012,Orlando,USA . New York:ACM Press , 2012 : 661 - 666 .

INOKUCHI A , WASHIO T , MOTODA H . An apriori-based algorithm for mining frequent substructures from graph data [C ] // The 4th European Conference on Principles of Data Mining and Knowledge Discovery,September 13-16,2000,London,UK . London:SpringerVerlag , 2000 : 13 - 23 .

KURAMOCHI M , KARYPIS G . Frequent subgraph discovery [C ] // IEEE International Conference on Data Mining,November 29-December 2,2001,San Jose,USA . Piscataway:IEEE Press , 2001 : 313 - 320 .

MEINL T , FISCHER I , PHILIPPSEN M . A quantitative comparison of the subgraph miners mofa,gspan,FFSM,and gaston [C ] // European Conference on Principles and Practice of Knowledge Discovery in Databases,October 3-7,2005,Porto,Portugal . Heidelberg:Springer , 2005 : 392 - 403 .

BORGELT C , BERTHOLD M R . Mining molecular fragments:finding relevant substructures of molecules [C ] // 2002 IEEE International Conference on Data Mining,December 9-12,2002,Maebashi City,Japan . Piscataway:IEEE Press , 2002 : 51 - 58 .

HUAN J , WANG W , PRINS J . Efficient mining of frequent subgraphs in the presence of isomorphism [C ] // The 3rd IEEE International Conference on Data Mining,November 19-22,2003,Melbourne,USA . Washington,DC:IEEE Computer Society , 2003 : 549 - 552 .

YAN X , HAN J . gSpan:graph-based substructure pattern mining [C ] // 2002 IEEE International Conference on Data Mining,December 9-12,2002,Maebashi City,Japan . Piscataway:IEEE Press , 2002 : 721 - 724 .

NIJSSEN S , . A quickstart in frequent structure mining can make a difference [C ] // The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,August 2225,2004,Seattle,USA . New York:ACM Press , 2004 : 647 - 652 .

BUEHRER G , PARTHASARATHY S , CHEN Y K . Adaptive parallel graph mining for CMP architectures [C ] // The 6th International Conference on Data Mining,December 18-22,2006,Hong Kong,China . Piscataway:IEEE Press , 2006 : 97 - 106 .

WANG C , WANG W , PEI J , et al . Scalable mining of large disk-based graph databases [C ] // The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,August 22-25,2004,Seattle,USA . New York:ACM Press , 2004 : 316 - 325 .

NGUYEN S N , ORLOWSKA M E , LI X . Graph mining based on a data partitioning approach [C ] // The 19th Conference on Australasian Database,December 3-4,2007,Gold Coast,Australia.Darlinghurst:Australian Computer Society,Inc . , 2008 : 31 - 37 .

DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters [C ] // The 6th Conference on Symposium on Opearting Systems Design ＆ Implementation,December 6-8,2004,San Francisco,USA . Berkeley:USENIX Association , 2004 : 107 - 113 .

BHUIYAN M A , AL H M . An iterative MapReduce based frequent subgraph mining algorithm [J ] . IEEE Transactions on Knowledge＆ Data Engineering , 2013 , 27 ( 3 ): 608 - 620 .

LU W , CHEN G , TUNG A K H , et al . Efficiently extracting frequent subgraphs using MapReduce [C ] // 2013 IEEE International Conference on Big Data,October 6-9,2013,Silicon Valley,USA . Piscataway:IEEE Press , 2013 : 639 - 647 .

LIN W , XIAO X , GHINITA G . Largescale frequent subgraph mining in MapReduce [C ] // 2014 IEEE 30th International Conference on Data Engineering,March 31-April l4,2014,Chicago,USA . Piscataway:IEEE Press , 2014 : 844 - 855 .

浏览量

230

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向国际的生命组学大数据管理体系建设

高通量DNA测序数据的生物信息学方法