[ "王腾蛟(1973-),男,北京大学信息科学技术学院教授、博士生导师,北京大学文理大数据研究中心常务副主任,主要研究方向为数据库管理系统、互联网数据分析、数据仓库与数据挖掘等。" ]
[ "李喜莲(1992-),女,北京大学信息科学技术学院硕士生,主要研究方向为大数据、机器学习和数据挖掘等。" ]
网络首发:2017-03,
纸质出版:2017-03-20
移动端阅览
王腾蛟, 李喜莲. 让大数据分析更可信[J]. 大数据, 2017,3(2):2017017.
Tengjiao WANG, Xilian LI. Making big data analysis more credible[J]. Big data research, 2017, 3(2): 2017017.
王腾蛟, 李喜莲. 让大数据分析更可信[J]. 大数据, 2017,3(2):2017017. DOI: 10.11959/j.issn.2096-0271.2017017.
Tengjiao WANG, Xilian LI. Making big data analysis more credible[J]. Big data research, 2017, 3(2): 2017017. DOI: 10.11959/j.issn.2096-0271.2017017.
大数据在学术界和产业界的各个领域正扮演着愈加重要的角色,但同时,大数据是否可信,引发了无数研究者的广泛关注和激烈讨论。从大数据名称的历史演变、大数据应用的案例分析以及大数据工程的角度探索大数据的可信程度,并由此总结出保证大数据分析正确性需要解决的3个挑战:正确选择数据源、科学抽样有代表性和有价值的数据、严谨完备的大数据工程分析方法。
Big data is playing an increasingly important role in various areas of academia and industry.However,whether big data can be trusted has caused widespread concern and intense discussion among countless researchers.The credibility of big data from the historical evolution of big data names,case studies of big data applications and big data engineering was explored,and thus the three challenges needed to be addressed to ensure the correctness of big data analysis were concluded:the right choice of data source,the scientific sampling of representative and valuable data,the rigorous and complete big data engineering analysis method.
BOND R M , FARISS C J , JONES J J , et al . A 61-million-person experiment in social influence and political mobilization [J ] . Nature , 2012 , 489 ( 7415 ): 295 - 298 .
EINAV L , LEVIN J . Economics in the age of big data [J ] . Science , 2014 , 346 ( 6210 ):1243089.
SCHICH M , SONG C , AHN Y Y , et al . Anetwork framework of cultural history [J ] . Science , 2014 , 345 ( 6196 ): 558 - 562 .
KHOURY M J , IOANNIDIS J P A . Big data meets public health [J ] . Science , 2014 , 346 ( 6213 ): 1054 - 1055 .
CURTIN R , PRESSER S , SINGER E . Changes in telephone survey nonresponse over the past quarter century [J ] . Public Opinion Quarterly , 2005 , 69 ( 1 ): 87 - 98 .
WEISBERG H F . The total survey error approach:a guide to the new science of survey research [M ] . Chicago : University of Chicago PressPress , 2005 .
CROSAS M , KING G , HONAKER J , et al . Automating open science for big data [J ] . The Annals of the American Academy of Political and Social Science , 2015 , 659 ( 1 ): 260 - 273 .
KING G . Ensuring the data-rich future of the social sciences [J ] . Science , 2011 , 331 ( 6018 ): 719 - 721 .
ASUR S , HUBERMAN B A . Predicting the future with social media [C ] // 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT),Aug 31-Sept 3,2010,Toronto,Canada . New Jersey : IEEE Press , 2010 : 492 - 499 .
CONNOR B O , BALASUBRAMANYAN R , ROUTLEDGE B R , et al . From tweets to polls:linking text sentiment to public opinion time series [C ] // 4th Int'l AAAI Conference on Weblogs and Social Media,May 23-26,2010,Washington,DC , USA .[S.l.:s.n. ] 2010 .
GOLDER S A , MACY M W . Digital footprints:opportunities and challenges for online social research [J ] . Annual Review of Sociology , 2014 , 40 ( 1 ): 129 - 152 .
0
浏览量
706
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621