[ "徐优俊(1990-),男,北京大学前沿交叉学科研究院博士生,主要研究方向为药物设计与药物信息。" ]
[ "裴剑锋(1975-),男,博士,北京大学前沿交叉学科研究院特聘研究员,主要研究方向为药物设计与药物信息。" ]
网络首发:2017-03,
纸质出版:2017-03-20
移动端阅览
徐优俊, 裴剑锋. 深度学习在化学信息学中的应用[J]. 大数据, 2017,3(2):2017019.
Youjun XU, Jianfeng PEI. Deep learning for chemoinformatics[J]. Big data research, 2017, 3(2): 2017019.
徐优俊, 裴剑锋. 深度学习在化学信息学中的应用[J]. 大数据, 2017,3(2):2017019. DOI: 10.11959/j.issn.2096-0271.2017019.
Youjun XU, Jianfeng PEI. Deep learning for chemoinformatics[J]. Big data research, 2017, 3(2): 2017019. DOI: 10.11959/j.issn.2096-0271.2017019.
深度学习在计算机视觉、语音识别和自然语言处理三大领域中取得了巨大的成功,带动了人工智能的快速发展。将深度学习的关键技术应用于化学信息学,能够加快实现化学信息处理的人工智能化。化合物结构与性质的定量关系研究是化学信息学的主要任务之一,着重介绍各类深度学习框架(深层神经网络、卷积神经网络、循环或递归神经网络)应用于化合物定量构效关系模型的研究进展,并针对深度学习在化学信息学中的应用进行了展望。
Deep learning have been successfully used in computer vision,speech recognition and natural language processing,leading to the rapid development of artificial intelligence.The key technology of deep learning was also applied to chemoinformatics,speeding up the implementation of artificial intelligence in chemistry.As developing quantitative structure-activity relationship model is one of major tasks for chemoinformatics,the application of deep learning technology in QSAR research was focused.How three kinds of deep learning frameworks,namely,deep neural network,convolution neural network,and recurrent or recursive neural network were applied in QSAR was discussed.A perspective on the future impact of deep learning on chemoinformatics was given.
HINTON G , DENG L , YU D , et al . Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups [J ] . IEEE Signal Processing Magazine , 2012 , 29 ( 6 ): 82 - 97 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks [J ] . Advances in Neural Information Processing Systems , 2012 , 25 ( 2 ): 1097 - 1105 .
COLLOBERT R , WESTON J . A unified architecture for natural language processing:deep neural networks with multitask learning [C ] // The 25th International Conference on Machine Learning,July 5-9,2008,Helsinki,Finland . New York : ACM Press , 2008 : 160 - 167 .
GAWEHN E , HISS J A , SCHNEIDER G . Deep learning in drug discovery [J ] . Molecular Informatics , 2016 , 35 ( 1 ): 3 - 14 .
RAGHU M , POOLE B , KLEINBERG J , et al . On the expressive power of deep neural networks [J ] . Statistics , 2016 ,arXiv:1606.05336.
HINTON G E , OSINDERO S , TEH Y W . A fast learning algorithm for deep belief nets [J ] . Neural Computation , 2006 , 18 ( 7 ): 1527 - 1554 .
SRIVASTAVA N , HINTON G E , KRIZHEVSKY A , et al . Dropout:a simple way to prevent neural networks from overfitting [J ] . Journal of Machine Learning Research , 2014 , 15 ( 1 ): 1929 - 1958 .
IOFFE S , SZEGEDY C . Batch normalization:accelerating deep network training by reducing internal covariate shift [J ] . Computer Science , 2015 ,arXiv:1502.03167.
GLOROT X , BORDES A , BENGIO Y . Deep sparse rectifier neural networks [C ] // The 14th International Conference on Artificial Intelligence and Statistics,April 11-13,2011,Fort Lauderdale,USA .[S.l.:s.n. ] , 2011 , 315 - 323 .
DUCHI J , HAZAN E , SINGER Y . Adaptive subgradient methods for online learning and stochastic optimization [J ] . Journal of Machine Learning Research , 2011 , 12 ( 7 ): 2121 - 2159 .
ZEILER M D . ADADELTA:an adaptive learning rate method [J ] . Computer Science , 2012 ,arXiv:1212.5701.
KINGMA D , BA J . Adam:a method for stochastic optimization [J ] . Computer Science , 2014 :arXiv:1412.6980.
MIKOLOV T,KARAFIÁT M , BURGET L , et al . Recurrent neural network based language model [C ] // The 11th Annual Conference of the International Speech Communication Association,September 26-30,2010,Makuhari,Chiba .[S.l.:s.n. ] , 2010 , 1045 - 1048 .
WU Y , SCHUSTER M , CHEN Z , et al . Google's neural machine translation system:bridging the gap between human and machine translation [J ] . Computer Science , 2016 ,arXiv:1609.08144.
VINCENT P , LAROCHELLE H , LAJOIE I , et al . Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion [J ] . Journal of Machine Learning Research , 2010 , 11 ( 12 ): 3371 - 3408 .
SOCHER R . Recursive deep learning for natural language processing and computer vision [J ] . Citeseer , 2014 ( 8 ):1.
HOCHREITER S , SCHMIDHUBER J . Long short-term memory [J ] . Neural Computation , 1997 , 9 ( 8 ): 1735 - 1780 .
孙潭霖 , 裴剑锋 . 大数据时代的药物设计与药物信息 [J ] . 科学通报 , 2015 ( 8 ): 689 - 693 .
SUN T L , PEI J F . Drug design and drug information is the big data era [J ] . Chinese Science Bulletin , 2015 ( 8 ): 689 - 693 .
SVETNIK V , LIAW A , TONG C , et al . Random forest:a classification and regression tool for compound classification and QSAR modeling [J ] . Journal of Chemical Information and Computer Sciences , 2003 , 43 ( 6 ): 1947 - 1958 .
RUPP M , TKATCHENKO A,MÜLLER K R , et al . Fast and accurate modeling of molecular atomization energies with machine learning [J ] . Physical Review Letters , 2012 , 108 ( 5 ): 3125 - 3130 .
RACCUGLIA P , ELBERT K C , ADLER P D F , et al . Machine-learning-assisted materials discovery using failed experiments [J ] . Nature , 2016 , 533 ( 7601 ): 73 - 76 .
DU H , WANG J , HU Z , et al . Prediction of fungicidal activities of rice blast disease based on least-squares support vector machines and project pursuit regression [J ] . Journal of Agricultural and Food Chemistry , 2008 , 56 ( 22 ): 10785 - 10792 .
LECUN Y , BENGIO Y , HINTON G . Deeplearning [J ] . Nature , 2015 , 521 ( 7553 ): 436 - 444 .
JAITLY N , NGUYEN P , SENIOR A W , et al . The 13th Annual Conference of the International Speech Communication Association,September 9-13,2012,Portland,OR,USA .[S.l.:s.n. ] , 2012 , 1 - 4 .
DAHL G E , YU D , DENG L , et al . Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition [J ] . IEEE Transactions on Audio,Speech,and Language Processing , 2012 , 20 ( 1 ): 30 - 42 .
GRAVES A , MOHAMED A R , HINTON G . Speech recognition with deep recurrent neural networks [C ] // 2013 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),May 26-31,2013,Vancouver,BC,Canada , New Jersey : IEEE Press , 2013 : 6645 - 6649 .
DENG L , YU D , DAHL G E . Deep belief network for large vocabulary continuous speech recognition:8972253 [P ] . 2015 - 03 - 20 .
GAO J , HE X , DENG L . Deep learning for web search and natural language processing [R ] . Redmond:Microsoft Research , 2015 .
MIKOLOV T , SUTSKEVER I , CHEN K , et al . Distributed representations of words and phrases and their compositionality [J ] . Advances in Neural Information Processing Systems , 2013 ,arXiv:1310.4546.
SOCHER R , LIN C C , MANNING C , et al . Parsing natural scenes and natural language with recursive neural networks [C ] // The 28th International Conference on Machine Learning (ICML-11),June 28-July 2,2011,Bellevue,Washington,USA .[S.l.:s.n. ] , 2011 : 129 - 136 .
HE K , ZHANG X , REN S , et al . Delving deep into rectifiers:surpassing human-level performance on imagenet classification [C ] // The IEEE International Conference on Computer Vision,December 13-16,2015,Santiago,Chile , New Jersey : IEEE Press , 2015 : 1026 - 1034 .
SZEGEDY C , LIU W , JIA Y , et al . Goingdeeper with convolutions [C ] // The IEEE Conference on Computer Vision and Pattern Recognition,June 7-12,2015,Boston,MA,USA , New Jersey : IEEE Press , 2015 : 1 - 9 .
RUSSAKOVSKY O , DENG J , SU H , et al . Imagenet large scale visual recognition challenge [J ] . International Journal of Computer Vision , 2015 , 115 ( 3 ): 211 - 252 .
HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C ] // The IEEE Conference on Computer Vision and Pattern Recognition,June 27-30,2016,Las Vegas,NV,USA , New Jersey : IEEE Press , 2016 : 770 - 778 .
MARKOFF J . Scientists see promise in deep-learning programs [N ] . New York Times , 2012 - 10 - 25 .
CARHART R E , SMITH D H , VENKATARAGHAVAN R . Atom pairs as molecular features in structure-activity studies:definition and applications [J ] . Journal of Chemical Information and Computer Sciences , 1985 , 25 ( 2 ): 64 - 73 .
KEARSLEY S K , SALLAMACK S , FLUDER E M , et al . Chemical similarity using physiochemical property descriptors [J ] . Journal of Chemical Information and Computer Sciences , 1996 , 36 ( 1 ): 118 - 127 .
RUMELHART D E , HINTON G E , WILLIAMS R J . Learning representations by back-propagating errors [J ] . Cognitive Modeling , 1988 , 5 ( 3 ):1.
MA J , SHERIDAN R P , LIAW A , et al . Deep neural nets as a method for quantitative structure-activity relationships [J ] . Journal of Chemical Information and Modeling , 2015 , 55 ( 2 ): 263 - 274 .
DAHL G E , JAITLY N , SALAKHUTDINOV R . Multi-task neural networks for QSAR predictions [J ] . Computer Science , 2014 ,arXiv:1406.1231.
EVGENIOU T , PONTIL M . Regularized multitask learning [C ] // The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,August 22-25,2004,Seattle,WA,USA . New Jersey : IEEE Press , 2004 : 109 - 117 .
MAURI A , CONSONNI V , PAVAN M , et al . Dragon software:an easy approach to molecular descriptor calculations [J ] . Match , 2006 , 56 ( 2 ): 237 - 248 .
SNOEK J , LAROCHELLE H , ADAMS R P . Practical bayesian optimization of machine learning algorithms [J ] . Advances in Neural Information Processing Systems , 2014 ,arXiv:1206.2944.
SNOEK J , SWERSKY K , ZEMEL R S , et al . Input warping for bayesian optimization of non-stationary functions [C ] // International Conference on Machine Learning,June 21-26,2014,Beijing,China .[S.l.:s.n. ] , 2014 : 1674 - 1682 .
FRIEDMAN J H . Greedy function approximation:a gradient boosting machine [J ] . Annals of Statistics , 2001 , 29 ( 5 ): 1189 - 1232 .
UNTERTHINER T , MAYR A , KLAMBAUER G , et al . Multi-task deep networks for drug target prediction [J ] . Neural Information Processing System , 2014 : 1 - 4 .
GAULTON A , BELLIS L J , BENTO A P , et al . ChEMBL:a large-scale bioactivity database for drug discovery [J ] . Nucleic Acids Research , 2012 , 40 ( D1 ): D1100 - D1107 .
ROGERS D , HAHN M . Extendedconnectivity fingerprints [J ] . Journal of Chemical Information and Modeling , 2010 , 50 ( 5 ): 742 - 754 .
HARPER G , BRADSHAW J , GITTINS J C , et al . Prediction of biological activity for high-throughput screening using binary kernel discrimination [J ] . Journal of Chemical Information and Computer Sciences , 2001 , 41 ( 5 ): 1295 - 1300 .
LOWE R , MUSSA H Y , NIGSCH F , et al . Predicting the mechanism of phospholipidosis [J ] . Journal of Cheminformatics , 2012 , 4 ( 1 ):2.
XIA X , MALISKI E G , GALLANT P , et al . Classification of kinase inhibitors using a Bayesian model [J ] . Journal of Medicinal Chemistry , 2004 , 47 ( 18 ): 4463 - 4470 .
KEISER M J , ROTH B L , ARMBRUSTER B N , et al . Relating protein pharmacology by ligand chemistry [J ] . Nature Biotechnology , 2007 , 25 ( 2 ): 197 - 206 .
WANG Y , SUZEK T , ZHANG J , et al . PubChem bioassay:2014 update [J ] . Nucleic Acids Research , 2014 , 42 ( Database Issue ): 1075 - 1082 .
ROHRER S G , BAUMANN K . Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data [J ] . Journal of Chemical Information and Modeling , 2009 , 49 ( 2 ): 169 - 184 .
MYSINGER M M , CARCHIA M , IRWIN J J , et al . Directory of useful decoys,enhanced (DUD-E):better ligands and decoys for better benchmarking [J ] . Journal of Medicinal Chemistry , 2012 , 55 ( 14 ): 6582 - 6594 .
RAMSUNDAR B , KEARNES S , RILEY P , et al . Massively multitask networks for drug discovery [J ] . Computer Science , 2015 ,arXiv:1502,02072.
MAYR A , KLAMBAUER G , UNTERTHINER T , et al . DeepTox:toxicity prediction using deep learning [J ] . Frontiers in Environmental Science , 2016 , 3 ( 8 ):80.
KAZIUS J , MCGUIRE R , BURSI R . Derivation and validation of toxicophores for mutagenicity prediction [J ] . Journal of Medicinal Chemistry , 2005 , 48 ( 1 ): 312 - 320 .
FRIEDMAN J , HASTIE T , TIBSHIRANI R . Regularization paths for generalized linear models via coordinate descent [J ] . Journal of Statistical Software , 2010 , 33 ( 1 ):1.
SIMON N , FRIEDMAN J , HASTIE T , et al . Regularization paths for Cox’s proportional hazards model via coordinate descent [J ] . Journal of Statistical Software , 2011 , 39 ( 5 ):1.
DUVENAUD D K , MACLAURIN D , IPARRAGUIRRE J , et al . Convolutional networks on graphs for learning molecular fingerprints [J ] . Advances in Neural Information Processing Systems , 2015 ,arXiv:1509.09292.
GRAVES A , WAYNE G , DANIHELKA I . Neural turing machines [J ] . Computer Science , 2014 ,arXiv:1410.5401.
MORGAN H L . The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service [J ] . Journal of Chemical Documentation , 1965 , 5 ( 2 ): 107 - 113 .
DELANEY J S . ESOL:estimating aqueous solubility directly from molecular structure [J ] . Journal of Chemical Information and Computer Sciences , 2004 , 44 ( 3 ): 1000 - 1005 .
GAMO F-J , SANZ L M , VIDAL J , et al . Thousands of chemical starting points for antimalarial lead identification [J ] . Nature , 2010 , 465 ( 7296 ): 305 - 310 .
HACHMANN J,OLIVARES-AMAYA R,ATAHAN-EVRENK S , et al . The Harvard clean energy project:largescale computational screening and design of organic photovoltaics on the world community grid [J ] . The Journal of Physical Chemistry Letters , 2011 , 2 ( 17 ): 2241 - 2251 .
KEARNES S , MCCLOSKEY K , BERNDL M , et al . Molecular graph convolutions:moving beyond fingerprints [J ] . Journal of Computer-Aided Molecular Design , 2016 , 30 ( 8 ): 595 - 608 .
HUGHES T B , MILLER G P , SWAMIDASS S J . Modeling epoxidation of drug-like molecules with a deep machine learning network [J ] . ACS Central Science , 2015 , 1 ( 4 ): 168 - 180 .
HUGHES T B , MILLER G P , SWAMIDASS S J . Site of reactivity models predict molecular reactivity of diverse chemicals with glutathione [J ] . Chemical Research in Toxicology , 2015 , 28 ( 4 ): 797 - 809 .
WALLACH I , DZAMBA M , HEIFETS A . AtomNet:a deep convolutional neural network for bioactivity prediction in structure-based drug discovery [J ] . Mathematische Zeitschrift , 2015 ,arXiv:1510.02855.
KOES D R , BAUMGARTNER M P , CAMACHO C J . Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise [J ] . Journal of Chemical Information and Modeling , 2013 , 53 ( 8 ): 1893 - 1904 .
GABEL J , DESAPHY J R M , ROGNAN D . Beware of machine learning-based scoring functions on the danger of developing black boxes [J ] . Journal of Chemical Information and Modeling , 2014 , 54 ( 10 ): 2807 - 2815 .
SPITZER R , JAIN A N . Surflex-Dock:docking benchmarks and real-world application [J ] . Journal of Computer-Aided Molecular Design , 2012 , 26 ( 6 ): 687 - 699 .
COLEMAN R G , STERLING T , WEISS D R.SAMPL4 & DOCK3 . 7:lessons for automated docking procedures [J ] . Journal of Computer-Aided Molecular Design , 2014 , 28 ( 3 ): 201 - 209 .
ALLEN W J , BALIUS T E , MUKHERJEE S , et al . DOCK 6:impact of new features and current docking performance [J ] . Journal of Computational Chemistry , 2015 , 36 ( 15 ): 1132 - 1156 .
PEREIRA J C , CAFFARENA E R , SANTOS C N D . Boosting docking-based virtual screening with deep learning [J ] . Journal of Chemical Information and Modeling , 2016 ,arXiv:1608.04844.
LUSCI A , POLLASTRI G , BALDI P . Deep architectures and deep learning in chemoinformatics:the prediction of aqueous solubility for drug-like molecules [J ] . Journal of Chemical Information and Modeling , 2013 , 53 ( 7 ): 1563 - 1575 .
JAIN N , YALKOWSKY S H . Estimation of the aqueous solubility I:application to organic nonelectrolytes [J ] . Journal of Pharmaceutical Sciences , 2001 , 90 ( 2 ): 234 - 252 .
LOUIS B , AGRAWAL V K , KHADIKAR P V . Prediction of intrinsic solubility of generic drugs using MLR,ANN and SVM analyses [J ] . European Journal of Medicinal Chemistry , 2010 , 45 ( 9 ): 4018 - 4025 .
AZENCOTT C A , KSIKES A , SWAMIDASS S J , et al . One-to fourdimensional kernels for virtual screening and the prediction of physical,chemical,and biological properties [J ] . Journal of Chemical Information and Modeling , 2007 , 47 ( 3 ): 965 - 974 .
FRÖHLICH H , WEGNER J K , ZELL A . Towards optimal descriptor subsetselection with support vector machines in classification and regression [J ] . QSAR &Combinatorial Science , 2004 , 23 ( 5 ): 311 - 318 .
XU Y , DAI Z , CHEN F , et al . Deep learning for drug-induced liver injury [J ] . Journal of Chemical Information and Modeling , 2015 , 55 ( 10 ): 2085 - 2093 .
LAKE B M , SALAKHUTDINOV R , TENENBAUM J B . Human-level concept learning through probabilistic program induction [J ] . Science , 2015 , 350 ( 6266 ): 1332 - 1338 .
ALTAE-TRAN H , RAMSUNDAR B , PAPPU A S , et al . Low data drug discovery with one-shot learning [J ] .Computer Science,2016,arXiv:1611.03199. Computer Science , 2016 ,arXiv:1611.03199.
KUHN M , LETUNIC I , JENSEN L J , et al . The SIDER database of drugs and side effects [J ] .Nucleic Acids Research,2015,44(D1):D1075. Nucleic Acids Research , 2015 , 2015 , 44 ( D1 ):D1075.
GÓMEZ-BOMBARELLI R , DUVENAUD D,HERNÁNDEZ-LOBATO J M , et al . Automatic chemical design using a datadriven continuous representation of molecules [J ] . Computer Science , 2016 ,1610.02415.
SEGLER M H S , KOGEJ T , TYRCHAN C , et al . Generating focussed molecule libraries for drug discovery with recurrent neural networks [J ] . Computer Science , 2017 ,arXiv:1701.01329.
0
浏览量
2502
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621