[ "丁小欧(1993- ),女,哈尔滨工业大学海量数据计算研究中心博士生,主要研究方向为时序数据挖掘与分析、数据清洗、数据质量管理等" ]
[ "王宏志(1978- ),男,博士,哈尔滨工业大学海量数据计算研究中心教授、博士生导师,主要研究方向为数据库管理系统、大数据管理与分析、数据治理等" ]
[ "于晟健(1997- ),男,哈尔滨工业大学海量数据计算研究中心硕士生,主要研究方向为时序数据分析、异常检测、时序数据清洗等" ]
网络首发:2019-11,
纸质出版:2019-11-15
移动端阅览
丁小欧, 王宏志, 于晟健. 工业时序大数据质量管理[J]. 大数据, 2019,5(6):2019047-1.
Xiaoou DING, Hongzhi WANG, Shengjian YU. Data quality management of industrial temporal big data[J]. Big Data Research, 2019, 5(6): 2019047-1.
丁小欧, 王宏志, 于晟健. 工业时序大数据质量管理[J]. 大数据, 2019,5(6):2019047-1. DOI: 10.11959/j.issn.2096-0271.2019047.
Xiaoou DING, Hongzhi WANG, Shengjian YU. Data quality management of industrial temporal big data[J]. Big Data Research, 2019, 5(6): 2019047-1. DOI: 10.11959/j.issn.2096-0271.2019047.
工业大数据已经成为我国制造业转型升级的重要战略资源,工业大数据分析问题正引起重视和关注。时序数据作为工业大数据中一种重要的数据形式,存在大量的数据质量问题,需要设计数据清洗方法对其进行检测和有效处理。介绍了工业时序大数据的特点及工业数据质量管理的难点,并对工业时序大数据质量管理的研究现状加以分析、总结,最后,提出了时序大数据质量管理方法和系统性能的提升方向。
Industrial big data has become an important strategic resource for the transformation and upgrading of China’s manufacturing industry
and industrial big data analysis is attracting more and more attention.As an important data form of industrial big data
time series have a lot of quality problems
which is necessary to be detected and handled effectively by designing data cleaning methods.The characteristics of industrial time series big data and the difficulties of industrial data quality management were introduced.Then
the recent developments in the area of that was analyzed and summarized.At last
the quality management method of temporal big data and the improvement direction of system performance were put forward.
张洁 , 秦威 , 鲍劲松 , 等 . 制造业大数据 [M ] . 上海 : 上海科学技术出版社 , 2016 .
ZHANG J , QIN W , BAO J S , et al . Big data in manufacturing industry [M ] . Shanghai : Shanghai Scientific & Technical PublishersPress , 2016 .
工业互联网产业联盟工业大数据特设组 . 工业大数据技术与应用实践 [M ] . 北京 : 电子工业出版社 , 2017 .
Industrial Big Data Task Group in Alliance of Industrial Internet . Industrial big data technology and application practice [M ] . Beijing : Publishing House of Electronics IndustryPress , 2017 .
国家制造强国建设战略咨询委员会 . 《中国制造2025》重点领域技术路线图 [Z ] . 北京 : 2015 .
National Manufacturing Strategy Advisory Committee . “Made in China 2025” technology roadmap for key areas [Z ] . Beijing : 2015 .
工业互联网产业联盟 . 中国工业大数据技术与应用白皮书 [Z ] . 北京 : 2017 .
Alliance of Industry Internet . White paper on big data technology and application in China’s industry [Z ] . Beijing : 2017 .
王建民 . 工业大数据技术综述 [J ] . 大数据 , 2017 , 3 ( 6 ): 3 - 14 .
WANG J M . Survey on industrial big data [J ] . Big Data Research , 2017 , 3 ( 6 ): 3 - 14 .
TANG Y , XIE Y , YANG X , et al . Tensor multi-elastic kernel self-paced learning for time series clustering [J ] . IEEE Transactions on Knowledge and Data Engineering,2019:10.1109/TKDE.2019.2937027 ,
RAWASSIZADEH R , MOMENI E , DOBBINS C , et al . Scalable daily human behavioral pattern mining from multivariate temporal data [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 11 ): 3098 - 3112 .
ZHAO J , ITTI L . Classifying time series using local descriptors with hybrid sampling [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 3 ): 623 - 637 .
GONZÁLEZ-VIDAL A , BARNAGHI P , SKARMETA A F . BEATS:blocks of eigenvalues algorithm for time series segmentation [J ] . IEEE Transactions on Knowledge and Data Engineering , 2018 , 30 ( 11 ): 2051 - 2064 .
YAGOUBI D , AKBARINIA R , MASSEGLIA F , et al . Massively distributed time series indexing and querying [J ] . IEEE Transactions on Knowledge and Data Engineering , 2018 , 32 ( 1 ): 108 - 120 .
LIU C , ZHANG K , XIONG H , et al . Temporal sclerotization on sequential data:patterns,categorization,and visualization [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 1 ): 211 - 223 .
AGRAWAL S , STEINBACH M , BOLEY D , et al . Mining novel multivariate relationships in time series data using correlation networks [J ] . IEEE Transactions on Knowledge and Data Engineering,2019:10.1109/TKDE.2019.2911681 ,
BATU B B , TEMIZEL T T , DÜZGÜN H Ş . A non-parametric algorithm for discovering triggering patterns of spatio-temporal event types [J ] . IEEE Transactions on Knowledge and Data Engineering , 2017 , 29 ( 12 ): 2629 - 2642 .
HAN M , FENG S , CHEN C L P , et al . Structured manifold broad learning system:a manifold perspective for large-scale chaotic time series analysis and prediction [J ] . IEEE Transactions on Knowledge and Data Engineering , 2019 , 31 ( 9 ): 1809 - 1821 .
MALENSEK M , PALLICKARA S , PALLICKARA S . Analytic queries over geospatial time-series data using distributed hash tables [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 6 ): 1408 - 1422 .
HAO Y , CAO H , MUEEN A , et al . Identify significant phenomenon-specific variables for multivariate time series [J ] . IEEE Transactions on Knowledge and Data Engineering,2019:10.1109/TKDE.2019.2934464 ,
CHEN D,TANG Y , ZHANG H , et a l . Incremental factorization of big time series data with blind factor approximation [J ] . IEEE Transactions on Knowledge and Data Engineering,2019:10.1109/TKDE.2019.2931687 .
CHU X , ILYAS I F , KRISHNAN S , et al . Data cleaning:overview and emerging challenges [C ] // The 2016 International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA . New York:ACM Press , 2016 : 2201 - 2206 .
李杰 , 倪军 , 王安正 . 从大数据到智能制造 [M ] . 上海 : 上海交通大学出版社 , 2017 .
LI J , NI J , WANG A Z . From big data to intelligent manufacturing [M ] . Shanghai : Shanghai Jiao Tong University PressPress , 2017 .
LIU Y , LI Z , ZHOU C , et al . Generative adversarial active learning for unsupervised outlier detection [J ] .. IEEE Transactions on Knowledge and Data Engineering,2019 Accepted .
HU W , GAO J , LI B , et al . Anomaly detection using local kernel density estimation and context-based regression [J ] . IEEE Transactions on Knowledge and Data Engineering,2018:10.1109/TKDE.2018.2882404 .
SHARMA V , KUMAR R , CHENG W , et al . NHAD:neuro-fuzzy based horizontal anomaly detection in online social networks [J ] . IEEE Transactions on Knowledge and Data Engineering , 2018 , 30 ( 11 ): 2171 - 2184 .
LU Y , CHEN F , WANG Y , et al . Discovering anomalies on mixed-type data using a generalized student-t based approach [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 10 ): 2582 - 2595 .
LIN X , PENG Y , CHOI B , et al . Humanpowered data cleaning for probabilistic reachability queries on uncertain graphs [J ] . IEEE Transactions on Knowledge and Data Engineering , 2017 , 29 ( 7 ): 1452 - 1465 .
HAO S , TANG N , LI G , et al . A novel cost-based model for data repairing [J ] . IEEE Transactions on Knowledge and Data Engineering , 2017 , 29 ( 4 ): 727 - 742 .
DASU T , LOH J M . Statistical distortion:consequences of data cleaning [J ] . Proceedings of the VLDB Endowment , 2012 , 5 ( 11 ): 1674 - 1683 .
BOHANNON P , FAN W , FLASTER M , et al . A cost-based model and effective heuristic for repairing constraints by value modification [C ] // The 2005 ACM SIGMOD International Conference on Management of Data,June 14-16,2005,Baltimore,Maryland . New York:ACM Press , 2005 : 143 - 154 .
SONG S , ZHU H , WANG J . Constraintvariance tolerant data repairing [C ] // The 2016 International Conference on Management of Data,June 26–July 1,2016,San Francisco,USA . New York:ACM Press , 2016 : 877 - 892 .
LI Z , WANG H , SHAO W , et al . Repairing data through regular expressions [J ] . Proceedings of the VLDB Endowment , 2016 , 9 ( 5 ): 432 - 443 .
KHAYYAT Z , ILYAS I F , JINDAL A , et al . Bigdansing:a system for big data cleansing [C ] // The 2015 ACM SIGMOD International Conference on Management of Data,May 31-June 4,2015,Melbourne,Australia . New York:ACM Press , 2015 : 1215 - 1230 .
JENSEN S K , PEDERSEN T B , THOMSEN C . Time series management systems:a survey [J ] . IEEE Transactions on Knowledge and Data Engineering , 2017 , 29 ( 11 ): 2581 - 2600 .
苏卫星 , 朱云龙 , 刘芳 , 等 . 时间序列异常点及突变点的检测算法 [J ] . 计算机研究与发展 , 2014 , 51 ( 4 ): 781 - 788 .
SU W X , ZHU Y L , LIU F , et al . Outliers and change-points detection algorithm for time series [J ] . Journal of Computer Research and Development , 2014 , 51 ( 4 ): 781 - 788 .
SALEHI M , LECKIE C , BEZDEK J C , et al . Fast memory efficient local outlier detection in data streams [J ] . IEEE Transactions on Knowledge and Data Engineering , 2016 , 28 ( 12 ): 3246 - 3260 .
CAO L , YANG D , WANG Q , et al . Scalable distance-based outlier detection over high-volume data streams [C ] // 2014 IEEE 30th International Conference on Data Engineering,March 31-April 4,2014,Chicago,USA . Piscataway:IEEE Press , 2014 : 76 - 87 .
YANG F , SONG H A , LIU Z , et al . Ares:automatic disaggregation of historical data [C ] // 2018 IEEE 34th International Conference on Data Engineering (ICDE),April 16-19,2018,Paris,France . Piscataway:IEEE Press , 2018 : 65 - 76 .
AROUS I , KHAYATI M , CUDRÉMAUROUX P , . et al RecovDB:accurate and efficient missing blocks recovery for large time series [C ] // 2019 IEEE 35th International Conference on Data Engineering (ICDE),April 8-11,2019,Macao,China . Piscataway:IEEE Press , 2019 : 1976 - 1979 .
WU S , WANG L , WU T , et al . Hankel matrix factorization for tagged time series to recover missing values during blackouts [C ] // 2019 IEEE 35th International Conference on Data Engineering (ICDE),April 8-11,2019,Macao,China . Piscataway:IEEE Press , 2019 : 1654 - 1657 .
FENG K , GUO T , CONG G , et al . SURGE:continuous detection of bursty regions over a stream of spatial objects [C ] // 2018 IEEE 34th International Conference on Data Engineering (ICDE),April 16-19,2018,Paris,France . Piscataway:IEEE Press , 2018 : 1292 - 1295 .
MA M , ZHANG S , PEI D , et al . Robust and rapid adaption for concept drift in software system anomaly detection [C ] // 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE),October 15-18,2018,Memphis,USA . Piscataway:IEEE Press , 2018 : 13 - 24
MEI J , DE CASTRO Y , GOUDE Y , et al . Nonnegative matrix factorization with side information for time series recovery and prediction [J ] . IEEE Transactions on Knowledge and Data Engineering , 2019 , 31 ( 3 ): 493 - 506 .
RONG K , BAILIS P . ASAP:prioritizing attention via time series smoothing [J ] . Proceedings of the VLDB Endowment , 2017 , 10 ( 11 ): 1358 - 1369 .
YOON S , LEE J G , LEE B S . NETS:extremely fast outlier detection from a data stream via set-based processing [J ] . Proceedings of the VLDB Endowment , 2019 , 12 ( 11 ): 1303 - 1315 .
SONG S , ZHAO A , WANG J , et al . SCREEN:stream data cleaning under speed constraints [C ] // ACM SIGMOD International Conference on Management of Data,May 31-June 4,2015,Amsterdam,The Netherlands . New York:ACM Press , 2015 .
ZHANG A , SONG S , WANG J . Sequential data cleaning:a statistical approach [C ] // The 2016 International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA . New York:ACM Press , 2016 : 909 - 924 .
YIN W , YUE T , WANG H , et al . Time series cleaning under variance constraints [C ] // International Conference on Database Systems for Advanced Applications,May 21-24,2018,Gold Coast,Australia . Heidelberg:Springer , 2018 .
SADIK S , GRUENWALD L , LEAL E . Wadjet:finding outliers in multiple multi-dimensional heterogeneous data streams [C ] // 2018 IEEE 34th International Conference on Data Engineering (ICDE),April 16-19,2018,Paris,France . Piscataway:IEEE Press , 2018 : 1232 - 1235 .
SONG S , CAO Y , WANG J . Cleaning timestamps with temporal constraints [J ] . Proceedings of the VLDB Endowment , 2016 , 9 ( 10 ): 708 - 719 .
ABEDJAN Z , AKCORA C G , OUZZANI M , et al . Temporal rules discovery for web data cleaning [J ] . Proceedings of the VLDB Endowment , 2015 , 9 ( 4 ): 336 - 347 .
陈乾 , 胡谷雨 , 路威 . 基于距离和DF-RLS的时间序列异常检测 [J ] . 计算机工程 , 2012 , 38 ( 12 ): 32 - 35 .
CHEN Q , HU G Y , LU W . Outlier detection for time series based on distance and DF-RLS [J ] . Computer Engineering , 2012 , 38 ( 12 ): 32 - 35 .
MILANI M , ZHENG Z , CHIANG F . Current clean:spatio-temporal cleaning of stale data [C ] // 2019 IEEE 35th International Conference on Data Engineering (ICDE),April 8-11,2019,Macao,China . Piscataway:IEEE Press , 2019 : 172 - 183 .
ZAMENI M , GHAFOORI Z , SADRI A , et al . Change point detection for streaming high-dimensional time series [C ] // The 24th International Conference on Database Systems for Advanced Applications,April 22-25,Chiang Mai,Thailand . Heidelberg:Springer , 2019 .
SOUIDEN I , BRAHMI Z , LAFI L . Data stream mining based-outlier prediction for cloud computing [C ] // The 33rd IEEE International Conference on Data Engineering,April 19-22,2017,San Diego,USA . Piscataway:IEEE Press , 2017 .
HAQUE A , KHAN L , BARON M , et al . Efficient handling of concept drift and concept evolution over Stream Data [C ] // 2016 IEEE 32nd International Conference on Data Engineering (ICDE),May 1620,2016,Helsinki,Finland . Piscataway:IEEE Press , 2016 : 481 - 492 .
XU H , CHEN W , ZHAO N , et al . Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in Web applications [C ] // The 2018 Web Conference,April 23-27,2018,Lyon,France.[S.l.:s.n . ] , 2018 .
MALHOTRA P , RAMAKRISHNAN A , ANAND G , et al . LSTM-based encoderdecoder for multi-sensor anomaly detection [J ] . Computer Science,2016,arXiv:1607.00148. ,
EICHMANN P , SOLLEZA F , TATBUL N , et al . Visual exploration of time series anomalies with metro-viz [C ] // The 2019 International Conference on Management of Data,June 30-July 5,2019,Amsterdam,Netherlands . New York:ACM Press , 2019 : 1901 - 1904 .
0
浏览量
667
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621