中国科学院计算技术研究所 北京 100019
[ "程学旗,男,中国科学院计算技术研究所研究员、博士生导师,中国科学院“网络数据科学与技术”重点实验室主任,目前主要从事网络数据科学和社会计算等研究领域的工作,主持和参与多项国家“973”计划、“863”计划、国家自然科学基金项目和国家自然科学基金委杰出青年基金项目,并多次荣获国家科技进步奖一等奖。近年来,在IEEE TKDE、ACM SIGIR、WWW等本领域顶级期刊与国际会议发表论文40余篇,并荣获CIKM最佳论文奖和SIGIR最佳学生论文奖。" ]
[ "兰艳艳,女,中国科学院计算技术研究所副研究员、硕士生导师,目前主要从事机器学习与数据挖掘领域的研究工作,在ACM SIGIR、NIPS、ICML等本领域顶级会议发表论文20余篇,并荣获SIGIR最佳学生论文奖" ]
网络首发:2015-06,
纸质出版:2015-06-20
移动端阅览
程学旗, 兰艳艳. 网络大数据的文本内容分析[J]. 大数据, 2015,1(3):55-64.
Xueqi Cheng, Yanyan Lan. Text Content Analysis for Web Big Data[J]. BIG DATA RESEARCH, 2015, 1(3): 55-64.
程学旗, 兰艳艳. 网络大数据的文本内容分析[J]. 大数据, 2015,1(3):55-64. DOI: 10.11959/j.issn.2096-0271.2015029.
Xueqi Cheng, Yanyan Lan. Text Content Analysis for Web Big Data[J]. BIG DATA RESEARCH, 2015, 1(3): 55-64. DOI: 10.11959/j.issn.2096-0271.2015029.
文本内容分析是实现大数据的理解与价值发现的有效手段。尝试从短文本主题建模、单词表达学习和网页排序学习3个子方向,探讨网络大数据文本内容分析的挑战和研究成果,最后指出未来大数据文本内容分析的一些研究方向和问题。
Text content analysis is an effective way to understand and acquire the “value” of big fata.The challenges and research results were investigated in the three hot topics: topic modeling for short texts
word embedding and learning to rank for web pages.In the end
some remaining problems in this area were proposed.
Hofmann T . Probabilistic latent semantic analysis . Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence , Stockholm,Sweden , 1999
Blei D M , Ng A Y , Jordan M I . Latent dirichlet allocation . The Journal of Machine Learning Research , 2003 , 3 : 993 ~ 1022
Metzler D , Dumais S , Meek C . Similarity measures for short segments of text . Proceedings of the 29th European Conference on IR Research , Rome,Italy , 2007 : 16 ~ 27
Hong L , Davison B . Empirical study of topic modeling in Twitter . Proceedings of the 1st Workshop on Social Media Analytics , Washington DC,USA , 2010 : 80 ~ 88
Zhao W , Jiang J , Weng J , et al . Comparing Twitter and traditional media using topic models . Proceedings of the 33rd European Conference on IR Research , Dublin,Ireland , 2011 : 338 ~ 349
Lakkaraju H , Bhattacharya I , Bhattacharyya C , et al . Dynamic multi-relational Chinese restaurant process for analyzing influences on users in social media . Proceedings of the 12th IEEE International Conference on Data Mining , Brussels,Belgium , 2012
Yan X H , Guo J F , Lan Y Y , et al . A biterm topic model for short texts . Proceedings of the 22nd International Conference on World Wide Web,Rio de Janeiro , Brazil , 2013 : 1445 ~ 1456
Cheng X Q , Yan X H , Lan Y Y , et al . BTM: topic modeling over short texts . IEEE Transactions on Knowledge and Data Engineering , 2014 , 26 ( 12 ): 2928 ~ 2941
Yan X H , Guo J F , Lan Y Y , et al . A probabilistic model for bursty topic discovery in microblogs . Proceedings of the 29th AAAI Conference on Artificial Intelligence , Austin Texas,USA , 2015
Bengio Y , Ducharme R , Vincent P , et al . A neural probabilistic language model . Journal of Machine Learning Research , 2003 , 3 : 1137 ~ 1155
Morin F , Bengio Y . Hierarchical probabilistic neural network language model . Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics , Barbados , 2005
Mnih A , Hinton G . Three new graphical models for statistical language modelling . Proceedings of the 24th International Conference on Machine Learning , New York,USA , 2007 : 641 ~ 648
Mnih A , Hinton G E . A scalable hierarchical distributed language model . Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS) , Vancouver,Canada , 2009
Mnih A , Kavukcuoglu K . Learning word embeddings efficiently with noise-contrastive estimation . Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS) , Lake Tahoe,Nevada,USA , 2013
Mikolov T , Chen K , Corrado G , et al . Efficient estimation of word representations in vector space . Proceedings of Workshop of ICLR , Arizona,USA , 2013
Sun F , Guo J F , Lan Y Y , et al . Learning word representation by jointly modeling syntagmatic and paradigmatic relations . Proceedings of the 53rd Annual Metting of the Association for Computational Linguistics , Beijing,China , 2015
Robertson S E . Overview of the okapi projects . Journal of Documentation , 1997 , 53 ( 1 ): 3 ~ 7
Zhai C , Lafferty J . A study of smoothing methods for language models applied to Ad Hoc information retrieval . Proceedings of the 24th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval , New Orleans,USA , 2001 : 334 ~ 342
Carbonell J , Goldstein J . The use of mmr,diversity-based reranking for reordering documents and producing summaries . Proceedings of the 21st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval , Melbourne,Australia , 1998 : 335 ~ 336
Liu T Y . Learning to Rank for Information Retrieval . New York : Springer-Verlag New York Inc , 2011
Liang S S , Ren Z C , Maarten D R . Personalized search result diversification via structured learning . In Proceedings of the 20th ACM SIGKDD , New York,USA , 2014 : 751 ~ 760
Yue Y , Joachims T . Predicting diverse subsets using structural svms . Proceedings of the 25th ICML , Helsinki,Finland , 2008 : 1224 ~ 1231
Zhu Y , Lan Y , Guo J , et al . Learning for search result diversification.Proceedings of the 37th Annual International ACM SIGIR Conference on Research &Development on Information Retrieval . Proceedings of the 37th Annual International ACM SIGIR Conference on Research &Development on Information Retrieval , Gold Coast,QLD,Australia , 2014 : 293 ~ 302
Xia L , Xu J , Lan Y Y , et al . Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. . Proceedings of the 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , Santiago,Chile , 2015
0
浏览量
580
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621