1. 深圳市大数据研究院,广东 深圳 518172
2. 香港浸会大学数学系,香港 999077
3. 香港科技大学数学系,香港 999077
[ "胡湘红(1991- ),女,深圳市大数据研究院博士生,主要研究方向为生物信息。" ]
[ "彭衡(1974- ),男,香港浸会大学数学系副教授,主要研究方向为金融计量经济学、生物信息、模型选择、非参数方法。" ]
[ "杨灿(1980- ),男,香港科技大学数学系助理教授,主要研究方向为生物信息学、高维数据分析、统计遗传学。" ]
[ "张纵辉(1981- ),男,深圳市大数据研究院副教授,主要研究方向为信号处理、最优化方法、数据通信。" ]
[ "万翔(1972- ),男,深圳市大数据研究院研究科学家,主要研究方向为机器学习、医疗大数据、生物信息。" ]
[ "罗智泉(1963- ),男,深圳市大数据研究院教授,主要研究方向为最优化方法、算法设计、信息科学。" ]
网络首发:2019-07,
纸质出版:2019-07-15
移动端阅览
胡湘红, 彭衡, 杨灿, 等. 基因大数据的集成分析[J]. 大数据, 2019,5(4):67-88.
Xianghong HU, Heng PENG, Can YANG, et al. Integrative analysis for big data in genomics[J]. Big Data Research, 2019, 5(4): 67-88.
胡湘红, 彭衡, 杨灿, 等. 基因大数据的集成分析[J]. 大数据, 2019,5(4):67-88. DOI: 10.11959/j.issn.2096-0271.2019033.
Xianghong HU, Heng PENG, Can YANG, et al. Integrative analysis for big data in genomics[J]. Big Data Research, 2019, 5(4): 67-88. DOI: 10.11959/j.issn.2096-0271.2019033.
随着生物科技(如基因芯片和测序技术)的飞速发展,全世界已经积累了海量的数据。有效地整合和集成多层面和多维度的基因大数据,对于全方位解析从遗传变异到疾病发生的整个因果链条具有关键作用,可为个性化、精准医疗服务奠定科学的基础。从3个方面对基因大数据的集成分析进行综述:检测风险位点及其功能分析、基因多效性的分析、基于孟德尔随机化的因果推断。进一步结合具体的应用案例进行了阐述,最后对基因大数据的集成分析研究进行了总结以及展望。
With the rapid development of bio-technology (e.g.
genotyping chip and sequencing)
world-wide researchers have accumulated massive data sets at different levels.Integrative analysis of multi-layered genomic data can greatly contribute to the completion of causal chain from genetic variants to phenotype variations
laying a scientific foundation for personalized and precise medicine.The integrative analysis from the following three aspects mainly reviewed:identification of causal variants and their functional annotation
pleiotropy in human complex traits
Mendelian randomization forcausal inference between phenotypes
and several case studies were provided.Finally
the importance of integrative analysis in genomic data for precision medicine was highlighted.
SUDLOW C , GALLACHER J , ALLEN N , et al . UK biobank:an open access resource for identifying the causes of a wide range of complex diseases of middleand old age [J ] . PLoS Medicine , 2015 , 12 ( 3 ):e1001779.
ENCODE Project Consortium . The ENCODE (ENCyclopedia of DNA elements) project [J ] . Science , 2004 , 306 ( 5696 ): 636 - 640 .
LONSDALE J , THOMAS J , SALVATORE M , et al . The genotype-tissue expression (GTEx) project [J ] . Nature Genetics , 2013 , 45 ( 6 ):580.
VISSCHER P M , MCEVOY B , YANG J . From Galton to GWAS:quantitative genetics of human height [J ] . Genetics Research , 2010 , 92 ( 5-6 ): 371 - 379 .
MANOLIO T A , COLLINS F S , COX N J , et al . Finding the missing heritability of complex diseases [J ] . Nature , 2009 , 461 ( 7265 ):747.
WOOD A R , ESKO T , YANG J , et al . Defining the role of common variation in the genomic and biological architecture of adult human height [J ] . Nature Genetics , 2014 , 46 ( 11 ):1173.
VATTIKUTI S , GUO J , CHOW C C . Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits [J ] . PLoS Genetics , 2012 , 8 ( 3 ):e1002637.
LEE S H , DECANDIA T R , RIPKE S , et al . Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs [J ] . Nature Genetics , 2012 , 44 ( 3 ): 247 - 250 .
LEE S H , RIPKE S , NEALE B M , et al . Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs [J ] . Nature Genetics , 2013 , 45 ( 9 ): 984 - 994 .
YANG J , LEE S H , GODDARD M E , et al . GCTA:a tool for genome-wide complex trait analysis [J ] . The American Journal of Human Genetics , 2011 , 88 ( 1 ): 76 - 82 .
ZHOU X , STEPHENS M . Genomewide efficient mixed-model analysis for association studies [J ] . Nature Genetics , 2012 , 44 ( 7 ):821.
LEE S H , WRAY N R , GODDARD M E , et al . Estimating missing heritability for disease from genome-wide association studies [J ] . American Journal of Human Genetics , 2011 , 88 ( 3 ): 294 - 305 .
HOFFMAN G E . Correcting for population structure and kinship using the linear mixed model:theory and extensions [J ] . PloS One , 2013 , 8 ( 10 ):e75707.
IONITA-LAZA I , LEE S , MAKAROV V , et al . Sequence kernel association tests for the combined effect of rare and common variants [J ] . The American Journal of Human Genetics , 2013 , 92 ( 6 ): 841 - 853 .
GUSEV A , LEE S H , TRYNKA G , et al . Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases [J ] . The American Journal of Human Genetics , 2014 , 95 ( 5 ): 535 - 552 .
LU Q , LI B , OU D , et al . A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics [J ] . The American Journal of Human Genetics , 2017 , 101 ( 6 ): 939 - 964 .
SIVAKUMARAN S , AGAKOV F , THEODORATOU E , et al . Abundant pleiotropy in human complex diseases and traits [J ] . The American Journal of Human Genetics , 2011 , 89 ( 5 ): 607 - 618 .
SIMON K C , MUNGER K L , YANG X , et al . Polymorphisms in vitamin D metabolism related genes and risk of multiple sclerosis [J ] . Multiple Sclerosis Journal , 2010 , 16 ( 2 ): 133 - 138 .
MANOUSAKI D , DUDDING T , HAWORTH S , et al . Low-frequency synonymous coding variation in CYP2R1 has large effects on vitamin D levels and risk of multiple sclerosis [J ] . The American Journal of Human Genetics , 2017 , 101 ( 2 ): 227 - 238 .
FLETCHER O , HOULSTON R S . Architecture of inherited susceptibility to common cancer [J ] . Nature Reviews Cancer , 2010 , 10 ( 5 ):353.
WANG Q , YANG C , GELERNTER J , et al . Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS [J ] . Human Genetics , 2015 , 134 ( 11-12 ): 1195 - 1209 .
PANOUTSOPOULOU K , METRUSTRY S , DOHERTY S A , et al . The effect of FTO variation on increased osteoarthritis risk is mediated through body mass index:a Mendelian randomization study [J ] . Annals of the Rheumatic Diseases , 2014 , 73 ( 12 ): 2082 - 2086 .
SOLOVIEFF N , COTSAPAS C , LEE P H , et al . Pleiotropy in complex traits:challenges and strategies [J ] . Nature Reviews Genetics , 2013 , 14 ( 7 ):483.
MAURANO M T , HUMBERT R , RYNES E , et al . Systematic localization of common disease-associated variation in regulatory DNA [J ] . Science , 2012 , 337 ( 6099 ): 1190 - 1195 .
ALBERT F W , KRUGLYAK L . The role of regulatory variation in complex traits and disease [J ] . Nature Reviews Genetics , 2015 , 16 ( 4 ):197.
FROMER M , P ROUSSOS , SIEBERTS S K , et al . Gene expression elucidates functional impact of polygenic risk for schizophrenia [J ] . Nature Neuroscience , 2016 , 19 ( 11 ): 1442 - 1453 .
GAGLIANO S A , BARNES M R , WEALE M E , et al . Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization [J ] . PLoS One , 2014 ( 9 ):e98122
CARBONETTO P , STEPHENS M . Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes,and cytokine signaling genes in Crohn’s disease [J ] . PLoS Genetics , 2013 , 9 ( 10 ):e1003770.
MING J , DAI M , CAI M , et al . LSMM:a statistical approach to integrating functional annotations with genome-wide association studies [J ] . Bioinformatics , 2018 , 34 ( 16 ): 2788 - 2796 .
ZOU J , HUSS M , ABID A , et al . A primer on deep learning in genomics [J ] . Nature Genetics , 2018 , 51 ( 1 ): 12 - 18 .
PARK Y , KELLIS M . Deep learning for regulatory genomics [J ] . Nature Biotechnology , 2015 , 33 ( 8 ):825.
ZHOU J , TROYANSKAYA O G . Predicting effects of noncoding variants with deep learning-based sequence model [J ] . Nature Methods , 2015 , 12 ( 10 ):931.
ERASLAN G , ARLOTH J , MARTINS J , et al . DeepWAS:directly integrating regulatory information into GWASusing deep learning supports master regulator mef2c as risk factor for major depressive disorder [J ] . BioRxiv , 2016 :069096.
GAMAZON E R , WHEELER H E , SHAH K P , et al . A gene-based association method for mapping traits using reference transcriptome data [J ] . Nature Genetics , 2015 , 47 ( 9 ):1091.
BARBEIRA A , SHAH K P , TORRES J M , et al . MetaXcan:summary statistics based gene-level association method infers accurate PrediXcan results [J ] . BioRxiv , 2016 :045260.
GUSEV A , KO A , SHI H , et al . Integrative approaches for large-scale transcriptomewide association studies [J ] . Nature Genetics , 2016 , 48 ( 3 ):245.
YANG C , WAN X , LIN X , et al . CoMM:a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information [J ] . Bioinformatics , 2018
BULIK-SULLIVAN B , FINUCANE H K , ANTTILA V , et al . An atlas of genetic correlations across human diseases and traits [J ] . Nature Genetics , 2015 , 47 ( 11 ):1236.
FURLOTTE N A , ESKIN E . Efficient multiple-trait association and estimation of genetic correlation using the matrixvariate linear mixed model [J ] . Genetics , 2015 , 200 ( 1 ): 59 - 68 .
LOH P-R , BHATIA G , GUSEV A , et al . Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis [J ] . Nature Genetics , 2015 , 47 ( 12 ): 1385 - 1392 .
CHUNG D , YANG C , LI C , et al . GPA:a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation [J ] . PLoS Genetics , 2014 , 10 ( 11 ):e1004787.
LIU J , WAN X , WANG C , et al . LLR:a latent low-rank approach to colocalizing genetic risk variants in multiple GWAS [J ] . Bioinformatics , 2017 , 33 ( 24 ): 3878 - 3886 .
MING J , WANG T , YANG C . LPM:a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations [J ] . BioRxiv , 2018 :439133.
KATAN M B . Apolipoprotein E isoforms,serum cholesterol,and cancer [J ] . International Journal of Epidemiology , 2004 , 33 ( 1 ):9.
GRAY R , WHEATLEY K . How to avoid bias when comparing bone marrow transplantation with chemotherapy [J ] . Bone Marrow Transplantation , 1991 , 7 ( 3 ): 9 - 12 .
SMITH G D , EBRAHIM S . Mendelian randomization:can genetic epidemiology contribute to understanding environmental determinants of disease [J ] . International Journal of Epidemiology , 2003 , 32 ( 1 ): 1 - 22 .
HEMANI G , BOWDEN J , SMITH G D . Evaluating the potential role of pleiotropy in Mendelian randomization studies [J ] . Human Molecular Genetics , 2018 , 27 ( R2 ): 195 - 208 .
BURGESS S , THOMPSON S G . Interpreting findings from Mendelian randomization using the MR-Egger method [J ] . European Journal of Epidemiology , 2017 , 32 ( 5 ): 377 - 389 .
BOWDEN J , SMITH G D , BURGESS S . Mendelian randomization with invalid instruments:effect estimation and bias detection through Egger regression [J ] . International Journal of Epidemiology , 2015 , 44 ( 2 ): 512 - 525 .
VERBANCK M , CHEN C Y , NEALE B , et al . Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases [J ] . Nature Genetics , 2018 , 50 ( 5 ):693.
ZHU Z , ZHENG Z , ZHANG F , et al . Causal associations between risk factors and common diseases inferred from GWAS summary data [J ] . Nature Communications , 2018 , 9 ( 1 ):224.
ZHAO Q , WANG J , HEMANI G , et al . Statistical inference in twosample summary-data Mendelian randomization using robust adjusted profile score [J ] . Computer Science , 2018 ,arXiv:1801.09652.
ZHAO J , MING J , HU X , et al . Bayesian Weighted Mendelian Randomization for Causal Inference based on Summary Statistics [J ] . Computer Science , 2018 ,arXiv:1811.10223.
OKBAY A , BASELMANS B M , DE NEVE J E , et al . Genetic variants associated with subjective well-being,depressive symptoms,and neuroticism identified through genome-wide analyses [J ] . Nature Genetics , 2016 , 48 ( 6 ):624.
Group of the Psychiatric Genomics Consortium . Identifi cation of risk loci with shared effects on five major psychiatric disorders:a genome-wide analysis [J ] . The Lancet , 2013 , 381 ( 9875 ): 1371 - 1379 .
XIA J , HE Q , LI Y , et al . The relationship between neuroticism,major depressive disorder and comorbid disorders in Chinese women [J ] . Journal of Affective Disorders , 2011 , 135 ( 1-3 ): 100 - 105 .
DE MOOR M H , VAN DEN BERG S M , VERWEIJ K J , et al . Meta-analysis of genome-wide association studies for neuroticism,and the polygenic association with major depressive disorder [J ] . JAMA Psychiatry , 2015 , 72 ( 7 ): 642 - 650 .
KETTUNEN J , DEMIRKAN A,WÜRTZ P , et al . Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA [J ] .,2016,7,1-9. Nature Communications , 2016 , 7 , 1 - 9 .
LOCKE A E , KAHALI B , BERNDT S I , et al . Genetic studies of body mass index yield new insights for obesity biology [J ] . Nature , 2015 , 518 ( 7538 ):197.
LU Y , DAY F R , GUSTAFSSON S , et al . New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk [J ] . Nature Communications , 2016 , 7 :10495.
SHUNGIN D , WINKLER T W , CROTEAUCHONKA D C , et al . New genetic loci link adipose and insulin biology to body fat distribution [J ] . Nature , 2015 , 518 ( 7538 ):187.
VAN DER VALK R J , KREINERMØLLER E , KOOIJMAN M N , et al . A novel common variant in DCST2 is associated with length in early life and height in adulthood [J ] . Human Molecular Genetics , 2014 , 24 ( 4 ): 1155 - 1168 .
HORIKOSHI M , BEAUMONT R N , DAY F R , et al . Genome-wide associations for birth weight and correlations with adult disease [J ] . Nature , 2016 , 538 ( 7624 ): 248 - 252 .
BRADFIELD J P , TAAL H R , TIMPSON N J , et al . A genome-wide association meta-analysis identifies new childhood obesity loci [J ] . Nature Genetics , 2012 , 44 ( 5 ):526.
TAAL H R , ST POURCAIN B , THIERING E , et al . Common variants at 12q15 and 12q24 are associated with infant head circumference [J ] . Nature Genetics , 2012 , 44 ( 5 ):532.
WOOD A R , ESKO T , YANG J , et al . Defining the role of common variation in the genomic and biological architecture of adult human height [J ] . Nature Genetics , 2014 , 46 ( 11 ): 1173 - 1186 .
LIU C , KRAJA A T , SMITH J A , et al . Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci [J ] . Nature Genetics , 2016 , 48 ( 10 ): 1162 - 1170 .
NIKPAY M , GOEL A , WON H H , et al . A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease [J ] . Nature Genetics , 2015 , 47 ( 10 ): 1121 - 1130 .
DEN HOED M , EIJGELSHEIM M , ESKO T , et al . Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders [J ] . Nature Genetics , 2013 , 45 ( 6 ): 621 - 631 .
NOLTE I M , MUNOZ M L , TRAGANTE V , et al . Genetic loci associated with heart rate variability and their effects on cardiac disease risk [J ] . Nature Communications , 2017 , 8 :15805.
PATERNOSTER L , STANDL M , WAAGE J , et al . Multi-ancestry genome-wide association study of 21,000 cases and 95,000 controls identifies new risk locifor atopic dermatitis [J ] . Nature Genetics , 2015 , 47 ( 12 ): 1449 - 1456 .
LIU J Z , VAN SOMMEREN S , HUANG H , et al . Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations [J ] . Nature Genetics , 2015 , 47 ( 9 ): 979 - 986 .
DUBOIS P C A , TRYNKA G , FRANKE L , et al . Multiple common variants for celiac disease influencing immune gene expression [J ] . Nature Genetics , 2010 , 42 ( 4 ): 295 - 302 .
SAWCER S , HELLENTHAL G , PIRINEN M , et al . Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis [J ] . Nature , 2011 , 476 ( 7359 ): 214 - 219 .
CORDELL H J , HAN Y , MELLS G F , et al . International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways [J ] . Nature Communications , 2015 , 6 ( 1 ):8019.
OKBAY A , BASELMANS B M L , DE NEVE J E , et al . Genetic variants associated with subjective well-being,depressive symptoms and neuroticism identified through genomewide analyses [J ] . Nature Genetics , 2016 , 48 ( 6 ): 624 - 633 .
CENSIN J C , NOWAK C , COOPER N , et al . Childhood adiposity and risk of type 1 diabetes:A Mendelian randomization study [J ] . PLoS Medicine , 2017 , 14 ( 8 ):e1002362.
BENTHAM J , MORRIS D L , CUNNINGHAME GRAHAM D S , et al . Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus [J ] . Nature Genetics , 2015 , 47 ( 12 ): 1457 - 1464 .
MORRIS A P , VOIGHT B F , TESLOVICH T M , et al . Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes [J ] . Nature Genetics , 2012 , 44 ( 9 ): 981 - 990 .
LU Y , DAY F R , GUSTAFSSON S , et al . New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk [J ] . Nature Communications , 2016 , 7 :10495.
LI M , LI Y , WEEKS O , et al . SOS2 and ACP1 Loci Identified through Large-Scale Exome Chip Analysis Regulate Kidney Development and Function [J ] . Journal of the American Society of Nephrology , 2017 , 28 ( 3 ): 981 - 994 .
STRAWBRIDGE R J , DUPUIS J , PROKOPENKO I , et al . Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes [J ] . Diabetes , 2011 , 60 ( 10 ): 2624 - 2634 .
TEUMER A , TIN A , SORICE R , et al . Genome-wide association studies identify genetic loci associated with albuminuria in diabetes [J ] . Diabetes , 2016 , 65 ( 3 ): 803 - 817 .
PERRY J R , STOLK L , FRANCESCHINI N , et al . Meta-analysis of genomewide association data identifies two loci influencing age at menarche [J ] . Nature Genetics , 2009 , 41 ( 6 ):648.
MANNING A K , HIVERT M F , SCOTT R A , et al . A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance [J ] . Nature Genetics , 2012 , 44 ( 6 ): 659 - 669 .
LAMBERT J C , IBRAHIM VERBAAS C A , HAROLD D , et al . Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease [J ] . Nature Genetics , 2013 , 45 ( 12 ): 1452 - 1458 .
BENYAMIN B , HE J , ZHAO Q , et al . Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis [J ] . Nature Communications , 2017 , 8 ( 1 ):611.
PANKRATZ N , BEECHAM G W , DESTEFANO A L , et al . Meta-analysis of Parkinson’s disease:identification of a novel locus,RIT2 [J ] . Annals of Neurology , 2012 , 71 ( 3 ): 370 - 384 .
OTOWA T , HEK K , LEE M , et al . Metaanalysis of genome-wide association studies of anxiety disorders [J ] . Molecular Psychiatry , 2016 , 21 ( 10 ): 1391 - 1399 .
SKLAR P , RIPKE S , SCOTT L J , et al . Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 [J ] . Nature Genetics , 2011 , 43 ( 10 ): 977 - 983 .
DUNCAN L E , RATANATHARATHORN A , AIELLO A E , et al . Largest GWAS of PTSD (N=20 070)yields genetic overlap with schizophrenia and sex differences in heritability [J ] . Molecular Psychiatry , 2017 , 23 ( 3 ):666.
DUNCAN L , YILMAZ Z , GASPAR H , et al . Significant locus and metabolic genetic correlations revealed in genome-wide association study of anorexia nervosa [J ] . American Journal of Psychiatry , 2017 , 174 ( 9 ): 850 - 858 .
PAPPA I , ST POURCAIN B , BENKE K , et al . A genome-wide approach to children’s aggressive behavior:the EAGLE consortium [J ] . American Journal of Medical Genetics Part B:Neuropsychiatric Genetics , 2016 , 171 ( 5 ): 562 - 572 .
GAO J , DAVIS L K , HART A B , et al . Genome-wide association study of loneliness demonstrates a role for common variation [J ] . Neuropsychopharmacology , 2016 , 42 ( 4 ): 811 - 821 .
STEWART S E , YU D , SCHARF J M , et al . Genome-wide association study of obsessive-compulsive disorder [J ] . Molecular Psychiatry , 2013 , 18 ( 7 ): 788 - 798 .
0
浏览量
404
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621