1. 中国科学院上海生命科学研究院生物医学大数据中心,上海 200031
2. 上海生物信息技术研究中心,上海 201203
[ "郑广勇(1977-),男,博士,中国科学院上海生命科学研究院生物医学大数据中心副研究员,主要研究方向为计算生物学、系统生物学以及生物医学大数据的深度挖掘。" ]
[ "杨桢(1981-),男,博士,中国科学院上海生命科学研究院生物医学大数据中心副研究员,主要研究方向为生物医学大数据的深度挖掘。" ]
[ "曹瑞芳(1989-),女,中国科学院上海生命科学研究院生物医学大数据中心工程师,主要研究方向为生物医学数据库和知识库的构建。" ]
[ "刘婉(1987-),女,博士,上海生物信息技术研究中心助理研究员,主要研究方向为微生物相关数据库与数据仓库、生物医学数据审编。" ]
[ "李亦学(1955-),男,博士,中国科学院上海生命科学研究院生物医学大数据中心研究员,主要研究方向为计算生物学、生物医学大数据的系统研究。" ]
[ "张国庆(1978-),男,博士,中国科学院上海生命科学研究院生物医学大数据中心研究员,主要研究方向为生物医学数据库和知识库的构建。" ]
网络首发:2018-05,
纸质出版:2018-05-15
移动端阅览
郑广勇, 杨桢, 曹瑞芳, 等. 宏基因组大数据分析的质量控制流程规范[J]. 大数据, 2018,4(3):2018025.
Guangyong ZHENG, Zhen YANG, Ruifang CAO, et al. Quality control of big data analysis for metagenomics[J]. Big Data Research, 2018, 4(3): 2018025.
郑广勇, 杨桢, 曹瑞芳, 等. 宏基因组大数据分析的质量控制流程规范[J]. 大数据, 2018,4(3):2018025. DOI: 10.11959/j.issn.2096-0271.2018025.
Guangyong ZHENG, Zhen YANG, Ruifang CAO, et al. Quality control of big data analysis for metagenomics[J]. Big Data Research, 2018, 4(3): 2018025. DOI: 10.11959/j.issn.2096-0271.2018025.
宏基因组数据具有数据量大、复杂度高的特点,从数据类型来看,其涵盖了元数据和测序数据。为了保证宏基因组数据后续功能分析的有效性和正确性,需要对这些元数据和测序数据进行严格的质量控制检测。详细描述了宏基因组数据的质量控制流程,包括元数据和测序数据的信息检查、低质量片段的过滤等过程,从而为宏基因组数据分析提供了预处理的规范,这将为微生物组大数据分析提供坚实的基础。
Metagenomic data has the characteristics of high volume and complexity.As for data type of metagenomics
it covers metadata and sequencing data.Before performing in-depth functional analysis of metagenomic data
strict quality control for these metadata and sequencing data are needed
so as to ensure the validity and correctness of subsequent data analysis.The quality control process of metagenomic data was described in detail
which included information checking of metadata and sequencing data
filtering of low quality fragments
and so on.A pre-processing specification for metagenomic data analysis was presented
and a solid foundation for big data analysis of microbiome was provided.
张国庆 , 宁康 , 职晓阳 , 等 . 建设微生物组大数据中心发挥长期科学影响 [J ] . 中国科学院院刊 , 2017 ( 3 ): 280 - 289 .
ZHANG G Q , NING K , ZHI X Y , et al . Development of comprehensive microbiome big data warehouse/center for long-term scientific impact [J ] . Bulletin of Chinese Acadamy of Sciences , 2017 ( 3 ): 280 - 289 .
HANDELSMAN J , RONDON M R , BRADY S F , et al . Molecular biological access to the chemistry of unknown soil microbes:a new frontier for natural products [J ] . Chemistry& Biology , 1998 , 5 ( 10 ): 245 - 249 .
WANG J , JIA H . Metagenome-wide association studies:fine-mining the microbiome [J ] . Nat Rev Microbiol , 2016 , 14 ( 8 ): 508 - 522 .
FANG H , CAI L , YU Y , et al . Metagenomic analysis reveals the prevalence of biodegradation genes for organic pollutants in activated sludge [J ] . Bioresource Technology , 2013 , 129 ( 2 ): 209 - 218 .
DOS SANTOS H F , CURY J C , DO CARMO F L , et al . Mangrove bacterial diversity and the impact of oil contamination revealed by pyrosequencing:bacterial proxies for oil pollution [J ] . Plos One , 2011 , 6 ( 3 ): > e16943.
QIN N , YANG F , LI A , et al . Alterations of the human gut microbiome in liver cirrhosis [J ] . Nature , 2014 , 513 ( 7516 ): 59 - 64 .
HE Z , PICENO Y , DENG Y , et al . The phylogenetic composition and structure of soil microbial communities shifts in response to elevated carbon dioxide [J ] . Isme Journal , 2012 , 6 ( 2 ): 259 - 272 .
CHEN Y E , TSAO H . The skin microbiome:current perspectives and future challenges [J ] . Journal of the American Academy of Dermatology , 2013 , 69 ( 1 ): 143 - 155 .
TURNBAUGH P J , LEY R E , HAMADY M , et al . The human microbiome project [J ] . Nature , 2007 , 449 ( 7164 ): 804 - 810 .
EHRLICH S D . Metagenomics of the intestinal microbiota:potential applications [J ] . Gastroenterologie Clinique Et Biologique , 2010 , 34 ( 4S1 ): S23 - S28 .
PAGANI I , LIOLIOS K , JANSSON J , et al . >The Genomes OnLine Database (GOLD) v.4:status of genomic and metagenomic projects and their associated metadata [J ] . Nucleic Acids Research , 2012 ( 40 ): 571 - 579 .
FIELD D , GARRITY G , GRAY T , et al . The minimum information about a genome sequence (MIGS) specification [J ] . Nature Biotechnology , 2008 , 26 ( 5 ): 541 - 547 .
KOTTMANN R , GRAY T,MURPHY , et al . A standard MIGS/MIMS compliant XML Schema:toward the development of the Genomic Contextual Data Markup Language (GCDML) [J ] . Omics-a Journal of Integrative Biology , 2008 , 12 ( 2 ): 115 - 121 .
YILMAZ P , KOTTMANN R , FIELD D , et al . Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications [J ] . Nature Biotechnology , 2011 , 29 ( 5 ): 415 - 420 .
CARAUS I , ALSUWAILEM A A , NADON R , et al . Detecting and overcoming systematic bias in high-throughput screening technologies:a comprehensive review of practical issues and methodological solutions [J ] . Briefings in Bioinformatics , 2015 , 16 ( 6 ):97416.
SCHMIEDER R , EDWARDS R . Quality control and preprocessing of metagenomic datasets [J ] . Bioinformatics , 2011 , 27 ( 6 ): 863 - 864 .
BOLGER A M , LOHSE M , USADEL B . Trimmomatic:a flexible trimmer for Illumina sequence data [J ] . Bioinformatics , 2014 , 30 ( 15 ): 2114 - 2120 .
SCHMIEDER R , YAN W L , ROHWER F , et al . TagCleaner:identification and removal of tag sequences from genomic and metagenomic datasets [J ] . Bmc Bioinformatics , 2010 , 11 ( 1 ): 1 - 14 .
LI H , DURBIN R . Fast and accurate longread alignment with Burrows-Wheeler transform [J ] . Bioinformatics , 2010 , 26 ( 5 ): 589 - 595 .
LANGMEAD B , TRAPNELL C , POP M , et al . Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J ] . Genome Biology , 2009 , 10 ( 3 ):R25.
SEGATA N , WALDRON L , BALLARINI A , et al . Metagenomic microbial community profiling using unique clade-specific marker genes [J ] . Nature Method , 2012 , 9 ( 8 ): 811 - 814 .
0
浏览量
1448
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621