[ "张燕(1985- ),女,星环信息科技(上海)股份有限公司人工智能研究员,主要研究方向为隐私计算、可解释AI、因果分析等" ]
[ "杨一帆(1985- ),男,博士,星环信息科技(上海)股份有限公司产品总监、首席科学家,主要研究方向为统计、图计算、强化学习等" ]
[ "伊人(1989- ),女,博士,星环信息科技(上海)股份有限公司隐私计算首席科学家,主要研究方向为隐私计算、联邦学习、知识图谱等" ]
[ "罗圣美(1971- ),男,博士,星环信息科技(上海)股份有限公司大数据研究院院长,主要研究方向为大数据、并行计算、云存储、人工智能等" ]
[ "唐剑飞(1986- ),男,星环信息科技(上海)股份有限公司大数据技术标准研究员,主要研究方向为大数据、数据库、图计算等" ]
[ "夏正勋(1979- ),男,星环信息科技(上海)股份有限公司高级研究员,主要研究方向为人工智能、大数据、数据库、流媒体处理技术等" ]
网络首发:2022-09,
纸质出版:2022-09-15
移动端阅览
张燕, 杨一帆, 伊人, 等. 隐私计算场景下数据质量治理探索与实践[J]. 大数据, 2022,8(5):55-73.
Yan ZHANG, Yifan YANG, Ren YI, et al. Exploration and practice of data quality governance in privacy computing scenarios[J]. Big data research, 2022, 8(5): 55-73.
张燕, 杨一帆, 伊人, 等. 隐私计算场景下数据质量治理探索与实践[J]. 大数据, 2022,8(5):55-73. DOI: 10.11959/j.issn.2096-0271.2022073.
Yan ZHANG, Yifan YANG, Ren YI, et al. Exploration and practice of data quality governance in privacy computing scenarios[J]. Big data research, 2022, 8(5): 55-73. DOI: 10.11959/j.issn.2096-0271.2022073.
隐私计算是一种新型数据处理技术,可以在保护数据隐私及安全的前提下,实现数据价值转化和流通。然而隐私计算场景中“数据可用不可见”的特性给传统的数据质量治理工作带来了很大的挑战,业界尚缺乏完善的解决方案。针对上述问题,提出一种适用于隐私计算场景的数据质量治理方法与流程,构建了本地与多方两个层级的数据质量评估体系,能够兼顾本地域及联邦域的数据质量治理工作,同时提出了一种数据贡献度衡量方法,对隐私计算的长效激励机制进行探索,从而提升隐私计算的数据质量,并提高计算结果的精度。
Privacy computing is a new data processing technology
which can realize the transformation and circulation of a data value on the premise of protecting data privacy and security.However
the invisible feature of data in private computing scenarios poses a great challenge to traditional data quality management.There is still a lack of perfect solutions.To solve the above problems in the industry
a data quality governance method and process suitable for privacy computing scenarios were proposed.A local and multi-party data quality evaluation system was constructed
which could take into account the data quality governance of the local domain and the federal domain.At the same time
a data contribution measurement method was proposed to explore the long-term incentive mechanism of privacy computing
improve the data quality of privacy computing
and improve the accuracy of computing results.
中国信息通信研究院 , 隐私计算联盟 . 隐私计算白皮书(2021年) [R ] . 2021 .
China Academy of Information and Communication Technology , Privacy Computing Alliance . Privacy computing white paper(2021) [R ] . 2021 .
符芳诚 , 侯忱 , 程勇 , 等 . 隐私计算关键技术与创新 [J ] . 信息通信技术与政策 , 2021 , 47 ( 6 ): 27 - 37 .
FU F C , HO U C , CHENG Y , et al . Key technology and innovation of privacy preserving computing [J ] . Information and Communications Technology and Policy , 2021 , 47 ( 6 ): 27 - 37 .
HARDY S , HENECKA W , IVEY-LAW H , , et al . Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption [J ] . arXiv preprint,2017,arXiv:1711.10677 .
李凤华 , 李晖 , 贾焰 , 等 . 隐私计算研究范畴及发展趋势 [J ] . 通信学报 , 2016 , 37 ( 4 ): 1 - 11 .
LI F H , LI H , JIA Y , et al . Privacy computing:concept,connotation and its research trend [J ] . Journal on Communications , 2016 , 37 ( 4 ): 1 - 11 .
YANG S W , REN B , ZHOU X H , et al . Parallel distributed logistic regression for vertical federated learning without thirdparty coordinator [J ] . arXiv preprint,2019,arXiv:1911.09824 .
WAND Y , WANG R Y . Anchoring data quality dimensions in ontological foundations [J ] . Communications of the ACM , 1996 , 39 ( 11 ): 86 - 95 .
PIPINO L L , LEE Y W , WANG R Y . Data quality assessment [J ] . Communications of the ACM , 2002 , 45 ( 4 ): 211 - 218 .
刘金晶 , 王梅 . 大数据下的数据质量评价指标构建实践 [J ] . 计算机技术与发展 , 2019 , 29 ( 10 ): 46 - 50 .
LIU J J , WANG M . Practice of data quality evaluating index construction under big data [J ] . Computer Technology and Development , 2019 , 29 ( 10 ): 46 - 50 .
中国信息通信研究院 , 大数据技术标准推进委员会 . 数据资产管理实践白皮书(4.0) [S ] . 2019 .
China Academy of Information and Communication Technology , Big Data Technology and Standerd Committee . 数据资产管理实践白皮书(4.0) [S ] . 2019 .
Firstlogic . Data quality assessment:a methodology for success [Z ] . 2003 .
HEER J , HELLERSTEIN J M , KANDEL S . Data wrangling [M ] // Encyclopedia of big data technologies . Cham : Springer , 2019 : 584 - 591 .
杨青云 , 赵培英 , 杨冬青 , 等 . 数据质量评估方法研究 [J ] . 计算机工程与应用 , 2004 , 40 ( 9 ): 3 - 4 , 15 .
YANG Q Y , ZHAO P Y , YANG D Q , et al . Research on data quality assessment methodology [J ] . Computer Engineering and Applications , 2004 , 40 ( 9 ): 3 - 4 , 15 .
WANG R Y , STOREY V C , FIRTH C P . A framework for analysis of data quality research [J ] . IEEE Transactions on Knowledge and Data Engineering , 1995 , 7 ( 4 ): 623 - 640 .
方幼林 , 杨冬青 , 唐世渭 , 等 . 数据仓库中数据质量控制研究 [J ] . 计算机工程与应用 , 2003 , 39 ( 13 ): 1 - 4 .
FANG Y L , YANG D Q , TANG S W , et al . Data quality managements in data warehouse [J ] . Computer Engineering and Applications , 2003 , 39 ( 13 ): 1 - 4 .
包阳 , 齐璇 , 李海龙 . 大型软件系统数据质量问题研究 [J ] . 计算机工程与设计 , 2011 , 32 ( 3 ): 963 - 967 , 987 .
BAO Y , QI X , LI H L . Research on data quality of large-scale software system [J ] . Computer Engineering and Design , 2011 , 32 ( 3 ): 963 - 967 , 987 .
宗威 , 吴锋 . 大数据时代下数据质量的挑战 [J ] . 西安交通大学学报(社会科学版) , 2013 , 33 ( 5 ): 38 - 43 .
ZONG W , WU F . The challenge of data quality in the big data age [J ] . Journal of Xi’an Jiaotong University (Social Sciences) , 2013 , 33 ( 5 ): 38 - 43 .
吴信东 , 董丙冰 , 堵新政 , 等 . 数据治理技术 [J ] . 软件学报 , 2019 , 30 ( 9 ): 2830 - 2856 .
WU X D , DONG B B , DU X Z , et al . Data governance technology [J ] . Journal of Software , 2019 , 30 ( 9 ): 2830 - 2856 .
中国信息通信研究院 . 数据安全治理实践指南(1.0) [R ] . 2001 .
China Academy of Information and Communication Technology . Data security governance practice guide (1.0) [R ] . 2001 .
黄刘生 , 田苗苗 , 黄河 . 大数据隐私保护密码技术研究综述 [J ] . 软件学报 , 2015 , 26 ( 4 ): 945 - 959 .
HUANG L S , TIAN M M , HUANG H . Preserving privacy in big data:a survey from the cryptographic perspective [J ] . Journal of Software , 2015 , 26 ( 4 ): 945 - 959 .
彭南博 , 王虎 , 等 . 联邦学习技术及实战 [M ] . 北京 : 电子工业出版社 , 2021 .
PENG N B , WANG H , et al . Federated learning techniques and practices [M ] . Beijing : Publishing House of Electronics Industry , 2021 .
杨强 , 刘洋 , 程勇 , 等 . 联邦学习 [M ] . 北京 : 电子工业出版社 , 2020 .
YANG Q , LIU Y , CHENG Y , et al . Federated learning [M ] . Beijing : Publishing House of Electronics Industry , 2020 .
李安然 . 面向特定任务的大规模数据集质量高效评估 [D ] . 合肥:中国科学技术大学 , 2021 .
LI A R . Efficient task-oriented quality assessment for large-scale datasets [D ] . Hefei:University of Science and Technology of China , 2021 .
WANG G , DANG C X , ZHOU Z Y . Measure contribution of participants in federated learning [C ] // Proceedings of 2019 IEEE International Conference on Big Data . Piscataway:IEEE Press , 2019 : 2597 - 2604 .
朱建明 , 张沁楠 , 高胜 , 等 . 基于区块链的隐私保护可信联邦学习模型 [J ] . 计算机学报 , 2021 , 44 ( 12 ): 2464 - 2484 .
ZHU J M , ZHANG Q N , GAO S , et al . Privacy preserving and trustworthy federated learning model based on blockchain [J ] . Chinese Journal of Computers , 2021 , 44 ( 12 ): 2464 - 2484 .
王鑫 , 周泽宝 , 余芸 , 等 . 一种面向电能量数据的联邦学习可靠性激励机制 [J ] . 计算机科学 , 2022 , 49 ( 3 ): 31 - 38 .
WANG X , ZHOU Z B , YU Y , et al . Reliable incentive mechanism for federated learning of electric metering data [J ] . Computer Science , 2022 , 49 ( 3 ): 31 - 38 .
KONEČNÝ J , MCMAHAN H B , YU F X , et al . Federated learning:strategies for improving communication efficiency [J ] . arXiv preprint,2016,arXiv:1610.05492 .
LI T , SAHU A K , TALWALKAR A , et al . Federated learning:challenges,methods,and future directions [J ] . arXiv preprint,2019,arXiv:1908.07873 .
YAO A C , . Protocols for secure computations [C ] // Proceedings of 23rd Annual Symposium on Foundations of Computer Science . Piscataway:IEEE Press , 1982 : 160 - 164 .
Open Mobile Terminal Platform Consortium . Advanced trusted environment:OMTP TR1 [Z ] . 2009 .
杨强 . 联邦学习:人工智能的最后一公里 [J ] . 智能系统学报 , 2020 , 15 ( 1 ): 183 - 186 .
YANG Q . Federated learning:the last on kilometer of artificial intelligence [J ] . CAAI Transactions on Intelligent Systems , 2020 , 15 ( 1 ): 183 - 186 .
杨一帆 , 邵一淼 , 施宇 . 一种分位数的获取方法,设备及存储介质:CN202111153418 [J ] .[P ] . 2021 - 09 - 29 .
YANG Y F , SHAO Y M , SHI Y . A method,device and storage medium for obtaining quantiles:CN202111153418 [J ] .[P ] . 2021 - 09 - 29 .
CRISTOFARO E , TSUDIK G . Practical private set intersection protocols with linear computational and bandwidth complexity [C ] // Proceedings of the 14th International Conference on Financial Cryptography and Data Security . Heidelberg:Springer , 2010 : 143 - 159 .
CRISTOFARO E , TSUDIK G . On the performance of certain private set intersection protocols [C ] // Proceedings of the 5th International Conference on Trust &Trustworthy Computing .[S.l.:s.n. ] , 2012 .
FREEDMAN M J , NISSIM K , PINKAS B . Efficient private matching and set intersection [C ] // Proceedings of the 2014 International Conference on the Theory and Applications of Cryptographic Techniques . Heidelberg:Springer , 2004 : 1 - 19 .
GOOD I J . Weight of evidence:a brief survey [J ] . Bayesian Statistics , 1985 ( 2 ): 249 - 270 .
RODRIGUEZ-LUJAN I , HUERTA R , ELKAN C , et al . Quadratic programming feature selection [J ] . The Journal of Machine Learning Research , 2010 , 11 ( 2 ): 1491 - 1516 .
JOHNSON T , DASU T . Data quality and data cleaning [C ] // Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data . New York:ACM Press , 2003 :681.
叶焕倬 , 吴迪 . 相似重复记录清理方法研究综述 [J ] . 现代图书情报技术 , 2010 ( 9 ): 56 - 66 .
YE H Z , WU D . A survey of approximately duplicate data cleaning method [J ] . New Technology of Library and Information Service , 2010 ( 9 ): 56 - 66 .
朱晓峰 . 缺失值填充的若干问题研究 [D ] . 桂林:广西师范大学 , 2007 .
ZHU X F . Studies on missing data imputation [D ] . Guilin:Guangxi Normal University , 2007 .
程开明 . 统计数据预处理的理论与方法述评 [J ] . 统计与信息论坛 , 2007 , 22 ( 6 ): 98 - 103 .
CHENG K M . The theory and methods of data preparation:an overview [J ] . Statistics & Information Forum , 2007 , 22 ( 6 ): 98 - 103 .
贾俊平 , 何晓群 , 金勇进 . 统计学(第六版) [M ] . 北京 : 中国人民大学出版社 , 2015 .
JIA J P , HE X Q , JIN Y J . Statistics [M ] . Beijing : China Renmin University Press , 2015 .
LIPOVETSKY S , CONKLIN M . Analysis of regression in game theory approach [J ] . Applied Stochastic Models in Business and Industry , 2001 , 17 ( 4 ): 319 - 330 .
ŠTRUMBELJ E , KONONENKO I . Explaining prediction models and individual predictions with feature contributions [J ] . Knowledge and Information Systems , 2014 , 41 ( 3 ): 647 - 665 .
LUNDBERG S , LEE S I . A unified approach to interpreting model predictions [J ] . arXiv preprint,2017,arXiv:1705.07874 .
汪云云 , 陈松灿 . 基于AUC的分类器评价和设计综述 [J ] . 模式识别与人工智能 , 2011 , 24 ( 1 ): 64 - 71 .
WANG Y Y , CHEN S C . A survey of evaluation and design for AUC based classifier [J ] . Pattern Recognition and Artificial Intelligence , 2011 , 24 ( 1 ): 64 - 71 .
张义莲 , 颜晟 , 朱旻捷 , 等 . 机器学习系统毒化攻击综述 [J ] . 通信技术 , 2020 , 53 ( 3 ): 535 - 542 .
ZHANG Y L , YAN S , ZHU M J , et al . Overview on poisoning attacks against machine learning system [J ] . Communications Technology , 2020 , 53 ( 3 ): 535 - 542 .
0
浏览量
483
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621