1. 中国科学院计算机网络信息中心,北京 100190
2. 中国科学院大学,北京 100049
[ "祝天刚(1988-),男,中国科学院大学硕士生,主要研究方向为数据挖掘。" ]
[ "郭旦怀(1973-),男,博士,中国科学院计算机网络信息中心副研究员、硕士生导师,主要研究方向为海量时空数据挖掘、大数据可视分析。" ]
[ "王学志(1979-),男,中国科学院计算机网络信息中心副研究员,主要研究方向为海量时空数据处理与分析。" ]
[ "黎建辉(1973-),男,博士,中国科学院计算机网络信息中心研究员、博士生导师,主要研究方向为大数据管理、大数据分析与处理。" ]
[ "周园春(1975-),男,博士,中国科学院计算机网络信息中心研究员、博士生导师,主要研究方向为数据挖掘、大数据分析与处理。" ]
网络首发:2016-03,
纸质出版:2016-03-20
移动端阅览
祝天刚, 郭旦怀, 王学志, 等. 基于短文本的食源性疾病事件探测技术[J]. 大数据, 2016,2(2):88-99.
Tiangang ZHU, Danhuai GUO, Xuezhi WANG, et al. Foodborne diseases event detection based on short text[J]. BIG DATA RESEARCH, 2016, 2(2): 88-99.
祝天刚, 郭旦怀, 王学志, 等. 基于短文本的食源性疾病事件探测技术[J]. 大数据, 2016,2(2):88-99. DOI: 10.11959/j.issn.2096-0271.2016022.
Tiangang ZHU, Danhuai GUO, Xuezhi WANG, et al. Foodborne diseases event detection based on short text[J]. BIG DATA RESEARCH, 2016, 2(2): 88-99. DOI: 10.11959/j.issn.2096-0271.2016022.
微博数据是短文本事件探测的典型数据源,由于微博内容的多样性、稀疏性和碎片性,现有事件探测方法使用的数据源单一且噪声较大,在时空信息的发现上粒度过大,导致结果的准确性差。因此,在事件探测算法上提出动态上下文窗口算法,构建候选微博进行事件探测,提高了事件探测的效率和精度。并提出利用微博内容发现特定事件地理位置信息的算法,提高了事件时空信息的获取精度。最后应用于食源性疾病事件的自动探测中,相比以往的事件探测方法,扩大了数据来源,且时间和空间维度上的准确性得到显著提高。
MicroBlog is a typical short text data source for event detection. Because of diversity
sparsity and debris in MicroBlog content
using existing event detection method is ineffective
and the event spatio-temporal information is inaccurate. To the end
a dynamic context window algorithm was proposed
improved the efficiency and precision of event detection of foodborne diseases based on MicroBlog. Moreover
an algorithm was developed which can get spatio-temporal information from MicroBlog more accurate. Finally
extensive experiments on event detection of foodborne diseases show the proposed method can help to expand the data source and improve the accuracy of time and space dimension.
中国互联网络信息中心. 第32次中国互联网络发展状况统计报告 [R ] . 北京 : 中国互联网络信息中心 , 2013 .
CNNIC.The 32th Chinese Internet Development Report [R ] . Beijing : CNNIC , 2013 .
祝华新 , 单学刚 , 胡江春 , 等 . 2011年中国互联网舆情分析报告 [R ] . [ 出版地不详:出版者不详 ] , 2011 .
ZHU H X , SHAN X G , HU J C , et al . 2011 China Internet Public Opinion Analysis Report [R ] . [ S.l.:s.n. ] , 2011 .
LI R , LEI K H , KHADIWALA R , et al . Tedas: a twitter-based event detection and analysis system[C]// IEEE 28th International Conference on Data Engineering (ICDE), April 1-5, 2012, Arlington, Virginia, USA . New Jersey:IEEE Press , 2012 : 1273 - 1276 .
GUPTA M , LI R , CHANG K C C . . Towards a social media analytics platform:event detection and user profiling for twitter[C]// The 23rd International World Wide Web Conference, April 7-11, 2014, Seoul, Korea . [S.l: s.n.] , 2014 : 193 - 194 .
LI C , SUN A , DATTA A . . Twevent: segment-based event detection from tweets[C]// The 21st ACM International Conference on Information and Knowledge Management, Oct 29-Nov 2, 2012, Maui, USA . New York: ACM Press , 2012 : 155 - 164 .
LEE K , AGTAWAL A , CHOUDHARY A . . Real-time disease surveillance using twitter data: demonstration on flu and cancer[C]// The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 11-14, 2013, Chicago, USA . New York: ACM Press , 2013 : 1474 - 1477 .
黄永光 , 刘挺 , 车万翔 , 等 . 面向变异短文本的快速聚类算法 [J ] . 中文信息学报 , 2007 , 21 ( 2 ): 63 - 68 .
HUANG Y G , LIU T , CHE W X , et al . A fast clustering algorithm for abnormal and short texts [J ] . Journal of Chinese Information Processing , 2007 , 21 ( 2 ): 63 - 68 .
杨震 , 段立娟 , 赖英旭 . . 基于字符串相似性聚类的网络短文本舆情热点发现技术 [J ] . 北京工业大学学报 , 2010 , 36 ( 5 ): 669 - 673 .
YANG Z , DUAN L J , LAI Y X . Online public opinion hotspot detection and analysis based on short text clustering using string distance [J ] . Journal of Beijing University of Technology , 2010 , 36 ( 5 ): 669 - 673 .
徐君飞 , 张居作 . . 2001-2010 年中国食源性疾病暴发情况分析 [J ] . 中国农学通报 , 2012 , 28 ( 27 ): 313 - 316 .
XU J F , ZHANG J Z . Analysis of foodborne disease outbreaks in China between 2001 and 2010 [J ] . Chinese Agricultural Science Bulletin , 2012 , 28 ( 27 ): 313 - 316 .
PARKER J , WEI Y , YATES A , et al . A framework for detecting public health trends with Twitter[C]// The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Aug 25-28, 2013, Niagara Falls, Canada . New Jersey: IEEE Press , 2013 : 556 - 563 .
PETROVIĆ S , OSBORNE M , LAVRENKO V . . Streaming first story detection with application to Twitter[C]// Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2, 2010, Rochester, NY, USA . [S.l: s.n.] , 2010 : 181 - 189 .
MATHIOUDAKIS M , KOUDAS N . Twittermonitor: trend detection over the twitter stream .[C ] // The 2010 ACM SIGMOD International Conference on Management of Data, June 6-11, 2010 , Indianapolis, USA. New York: ACM Press , 2010 : 1155 - 1158 .
MARCHETTE D J , HOHMAN E . Tracking Disease Outbreaks Using Twitter [R ] . [S.l: s.n.] , 2014 .
CHENG Z , CAVERLEE J , LEE K . . You are where you tweet: a content-based approach to geo-locating twitter users [C ] // The 19th ACM International Conference on Information and Knowledge Management, October 26-30, 2010, Toronto, Canada , New York: ACM Press , 2010 : 759 - 768 .
CULOTTA A . Towards detecting influenza epidemics by analyzing Twitter messages[C]// The 1st Workshop on Social Media Analytics, July 25, 2010, Washington DC, USA . [S.l: s.n.] , 2010 : 115 - 122 .
THOM D , BOSCH H , KRüGER R , et al . Using large scale aggregated knowledge for social media location discovery[C]//IEEE 47th Hawaii International Conference on System Sciences (HICSS), January 6-9, 2014,Washington DC, USA . New Jersey: IEEE Press , 2014 : 1464 - 1473 .
MAHMUD J , NICHOLS J , DREWS C . . Where is this tweet from?Inferring home locations of Twitter users[C]// The 6th International AAAI Conference on Weblogs and Social Media, June 4-8, 2012, Dublin, Ireland . Palo Alto: AAAI Press , 2012 : 511 - 514 .
PAUL M J , DREDZE M . You are what you tweet: analyzing Twitter for public health[C]// The 6th International AAAI Conference on Weblogs and Social Media, June 4-7, 2011, Barcelona, Spain . Palo Alto: AAAI Press , 2011 : 265 - 272 .
SIGNORINI A , SEGRE A M , POLGREEN P M . . The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic [J ] . PLoS One , 2011 , 6 ( 5 ):e19467.
HARRIS J K , MANSOUR R , CHOUCAIR B , et al . Health department use of social media to identify foodborne illness-Chicago, Illinois, 2013-2014 [J ] . MMWR Morb Mortal Wkly Rep , 2014 , 63 ( 32 ): 681 - 685 .
PAUL M , DREDZE M . A model for mining public health topics from Twitter [D ] . Baltimore: The Johns Hopkins University , 2011 .
IMRAN M , CASTILLO C , DIAZ F , et al . Processing social media messages in mass emergency: a survey [J ] . arXiv Preprint , 2014 ,arXiv:1407.7071.
SAYYADI H , HURST M , MAYKOV A . . Event detection and tracking in social streams[C]//The 3rd International AAAI Conference on Weblogs and Social Media,May 17-20, 2009, San Jose, California, USA . Palo Alto: AAAI Press , 2009 : 1 - 4 .
SCALLAN E , HOEKSTRA R M , ANGULO F J , et al . Foodborne illness acquired in the United States-major pathogens [J ] . Emerging Infectious Diseases , 2011 , 17 ( 1 ): 1339 - 40 .
ALVANAKI F , SEBASTIAN M , RAMAMRITHAM K , et al . EnBlogue:emergent topic detection in web 2.0 streams[C]// The 2011 ACM SIGMOD International Conference on Management of Data, June 12-16, 2011, Athens, Greece . New York: ACM Press , 2011 : 1271 - 1274 .
PAL A , COUNTS S . Identifying topical authorities in microblogs[C]// The 4th ACM International Conference on Web Search and Data Mining, February 9-12, 2011, Hong Kong, China . New York: ACM Press , 2011 : 45 - 54 .
CHEW C , EYSENBACH G . Pandemics in the age of Twitter: content analysis of tweets during the 2009 H1N1 outbreak [J ] . PLoS One , 2010 , 5 ( 11 ):e14118.
WENG J , LEE B S . Event detection in Twitter[C]// The 6th International AAAI Conference on Weblogs and Social Media, June 4-7, 2012, Barcelona, Spain . Palo Alto: AAAI Press , 2011 : 401 - 408 .
YANG Y , PIERCE T , CARBONELL J . . A study of retrospective and on-line event detection[C]//The 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA . New York:ACM Press , 1998 : 28 - 36 .
HUTWAGNER L C , MALONEY E K , BEAN N H , et al . Using laboratory-based surveillance data for prevention:an algorithm for detecting Salmonella outbreaks [J ] . Emerging Infectious Diseases , 1997 , 3 ( 3 ): 395 .
STERN L , LIGHTFOOT D . Automated outbreak detection: a quantitative retrospective analysis [J ] . Epidemiology and Infection , 1999 , 122 ( 1 ): 103 - 110 .
CHUNARA R , ANDREWS J R , BROWNSTEIN J S . . Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak [J ] . The American Journal of Tropical Medicine and Hygiene , 2012 , 86 ( 1 ): 39 - 45 .
POLGREEN P M , CHEN Y , PENNOCK D M , et al . Using internet searches for influenza surveillance [J ] . Clinical Infectious Diseases , 2008 , 47 ( 11 ): 1443 - 1448 .
ARAMAKI E , MASKAWA S , MORITA M . . Twitter catches the flu: detecting influenza epidemics using Twitter[C]//The Conference on Empirical Methods in Natural Language Processing, July 27-31, 2011, Edinburgh, UK . [S.l: s.n.] , 2011 : 1568 - 1576 .
BUSANI L , SCAVIA G , LUZZI I , et al . Laboratory surveillance for prevention and control of foodborne zoonoses [J ] . Annali Dell’Istituto Superiore Di Sanità , 2005 , 42 ( 4 ): 401 - 404 .
COLLIER N , DOAN S , KAWAZOE A , et al . BioCaster: detecting public health rumors with a web-based text mining system [J ] . Bioinformatics , 2008 , 24 ( 24 ): 2940 - 2941 .
MIKOLOV T , SUTSKEVER I , CHEN K , et al . Distributed representations of words and phrases and their compositionality [J ] . arXiv Preprint , 2013 ,arXiv:1310.4546.
刘知远 . 基于文档主题结构的关键词抽取方法研究 [D ] . 北京: 清华大学 , 2011 .
LIU Z Y . Research on keyword extraction using document topical structure [D ] . Beijing: Tsinghua University , 2011 .
0
浏览量
249
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621