基于优化反馈的组合在线学习

孔芳; 杨悦然; 陈卫; 李帅

doi:10.11959/j.issn.2096-0271.2021052

您当前的位置：

首页 >

文章列表页 >

基于优化反馈的组合在线学习

专栏：数据驱动的优化 | 更新时间：2024-06-03

- 基于优化反馈的组合在线学习
- Combinatorial online learning based on optimizing feedbacks
- 大数据 2021年7卷第5期页码：2021052
- 作者机构：
  
  1. 上海交通大学约翰·霍普克罗夫特计算机科学中心，上海 200240
  2. 微软亚洲研究院，北京 100080
- 作者简介：
  
  [ "孔芳（1998- ），女，上海交通大学电子信息与电气工程学院博士生，主要研究方向为组合在线学习、在线影响力最大化等" ]
  [ "杨悦然（1999- ），女，上海交通大学数学科学学院在读，主要研究方向为组合在线学习等" ]
  [ "陈卫（1968- ），男，博士，微软亚洲研究院高级研究员，中国科学院计算技术研究所客座研究员，中国计算机学会大数据专家委员会和理论计算机科学专业委员会委员，IEEE Fellow，《大数据》期刊编委。主要研究方向为在线学习和优化、社交和信息网络、网络博奕论和经济学、分布式计算、容错等" ]
  [ "李帅（1988- ），女，上海交通大学约翰·霍普克罗夫特计算机科学中心助理教授，主要研究方向为多臂老虎机、在线学习、机器学习理论、强化学习、推荐系统等" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(62006151);国家自然科学基金资助项目;The National Natural Science Foundation of China(62076161);上海市青年科技英才扬帆计划;Shanghai Sailing Program
- DOI：10.11959/j.issn.2096-0271.2021052
  中图分类号： TP181
- 网络首发：2021-09，
  
  纸质出版：2021-09-15
- 稿件说明：
移动端阅览
孔芳, 杨悦然, 陈卫, 等. 基于优化反馈的组合在线学习[J]. 大数据, 2021,7(5):2021052.

Fang KONG, Yueran YANG, Wei CHEN, et al. Combinatorial online learning based on optimizing feedbacks[J]. Big data research, 2021, 7(5): 2021052.
孔芳, 杨悦然, 陈卫, 等. 基于优化反馈的组合在线学习[J]. 大数据, 2021,7(5):2021052. DOI： 10.11959/j.issn.2096-0271.2021052.

Fang KONG, Yueran YANG, Wei CHEN, et al. Combinatorial online learning based on optimizing feedbacks[J]. Big data research, 2021, 7(5): 2021052. DOI： 10.11959/j.issn.2096-0271.2021052.

摘要

组合在线学习问题研究如何在与环境的交互过程中学习未知参数，逐步找到最优的目标组合。该问题有丰富的应用场景，如广告投放、搜索和推荐等。首先阐述了组合在线学习问题的定义及其框架——组合多臂老虎机问题，归纳了此框架下的经典算法和研究进展；然后具体介绍了该问题的两个实际应用——在线影响力最大化和在线排序学习问题，以及其研究进展；最后展望了组合在线学习问题的未来研究方向。

Abstract

Combinatorial online learning studies how to learn the unknown parameters and gradually find the optimal combination of targets during the interactions with the environment.This problem has a wide range of applications including advertisement placement

searching and recommendation.Firstly

the definition of combinatorial online learning and its general framework – the problem of combinatorial multi-armed bandits were introduced

and its traditional algorithms and research progress were summarized.Then

the related works of two specific applications

online influence maximization and online learning to rank

were introduced.Finally

the prospective directions of further researches on combinatorial online learning were discussed.

关键词

Keywords

references

ALON N , CESA-BIANCHI N ,, DEKEL O , et al . Online learning with feedback graphs:beyond bandits [J ] . arXiv preprint,2015,arXiv:1502.07617 .

LYKOURIS T , TARDOS É ,, WALI D . Feedback graph regret bounds for Thompson sampling and UCB [C ] // Proceedings of the 31st International Conference on Algorithmic Learning Theory .[S.l.:s.n. ] , 2020 .

BARTÓK G , FOSTER D P , PÁL D , , et al . Partial monitoring-classification,regret bounds,and algorithms [J ] . Mathematics of Operations Research , 2014 , 39 ( 4 ): 967 - 997 .

AUER P , CESA-BIANCHI N , FISCHER P . Finite-time analysis of the multiarmed bandit problem [J ] . Machine Learning , 2002 , 47 ( 2 ): 235 - 256 .

THOMPSON W R . On the likelihood that one unknown probability exceeds another in view of the evidence of two samples [J ] . Biometrika , 1933 , 25 ( 3/4 ): 285 - 294 .

BUBECK S , MUNOS R , STOLTZ G . Pure exploration in multi-armed bandits problems [M ] // Lecture Notes in Computer Science . Heidelberg : Springer , 2009 : 23 - 37 .

AUDIBERT J Y , BUBECK S , MUNOS R . Best arm identification in multi-armed bandits [C ] // Proceedings of the 23rd Annual Conference on Learning Theory .[S.l.:s.n. ] , 2010 : 41 - 53 .

KARNIN Z , KOREN T , SOMEKH O . Almost optimal exploration in multiarmed bandits [C ] // Proceedings of the International Conference on Machine Learning . New York:ACM Press , 2013 : 1238 - 1246 .

GARIVIER A , KAUFMANN E . Optimal best arm identification with fixed confidence [C ] // Proceedings of the 29th Annual Conference on Learning Theory .[S.l.:s.n. ] , 2016 .

WU Y F , SHARIFF R , LATTIMORE T , et al . Conservative bandits [C ] // Proceedings of the 33rd International Conference on Machine Learning . New York:ACM Press , 2016 .

AUER P , CESA-BIANCHI N ,, FREUND Y , et al . The nonstochastic multiarmed bandit problem [J ] . SIAM Journal on Computing , 2002 , 32 ( 1 ): 48 - 77 .

LI L H , CHU W , LANGFORD J , et al . A contextual-bandit approach to personalized news article recommendation [C ] // Proceedings of the 19th International Conference on World Wide Web . New York:ACM Press , 2010 .

GAI Y , KRISHNAMACHARI B , JAIN R . Learning multiuser channel allocations in cognitive radio networks:a combinatorial multi-armed bandit formulation [C ] // Proceedings of 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum . Piscataway:IEEE Press , 2010 : 1 - 9 .

CHEN W , HU W , LI F , et al . Combinatorial multi-armed bandit with general reward functions [C ] // Proceedings of the 30th International Conference on Neural Information Processing Systems .[S.l.:s.n. ] , 2016 .

WANG Q S , CHEN W . Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York:ACM Press , 2017 : 1161 - 1171 .

CHEN W , WANG Y J , YUAN Y . combinatorial multi-armed bandit:general framework and applications [C ] // Proceedings of the 30th International Conference on Machine Learning .[S.l.:s.n. ] , 2013 : 151 - 159 .

GAI Y , KRISHNAMACHARI B , JAIN R . Combinatorial network optimization with unknown variables:multi-armed bandits with linear rewards and individual observations [J ] . IEEE/ACM Transactions on Networking , 2012 , 20 ( 5 ): 1466 - 1478 .

CHEN W , WANG Y J , YUAN Y , et al . Combinatorial multi-armed bandit and its extension to probabilistically triggered arms [J ] . Journal of Machine Learning Research , 2016 , 17 ( 1 ): 1746 - 1778 .

ABBASI-YADKORI Y , PÁL D , SZEPESVÁRI C . Improved algorithms for linear stochastic bandits [C ] // Proceedings of the 24th International Conference on Neural Information Processing Systems . New York:ACM Press , 2011 : 2312 - 2320 .

DANI V , HAYES T P , KAKADE S M . Stochastic linear optimization under bandit feedback [Z ] . 2008 .

RUSMEVICHIENTONG P , TSITSIKLIS J N . Linearly parameterized bandits [J ] . Mathematics of Operations Research , 2010 , 35 ( 2 ): 395 - 411 .

KVETON B , WEN Z , ASHKAN A , et al . Matroid bandits:fast combinatorial optimization with learning [C ] // Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence .[S.l.:s.n. ] , 2014 .

CHEN S , LIN T , KING I , et al . Combinatorial pure exploration of multiarmed bandits [C ] // Proceedings of the 28th Conference on Neural Information Processing Systems .[S.l.:s.n. ] , 2014 : 379 - 387 .

GABILLON V , LAZARIC A , GHAVAMZADEH M , et al . Improved learning complexity in combinatorial pure exploration bandits [C ] // Proceedings of the 19th International Conference on Artificial Intelligence and Statistics .[S.l.:s.n. ] , 2016 .

KUROKI Y , XU L Y , MIYAUCHI A , et al . Polynomial-time algorithms for multiplearm identification with full-bandit feedback [J ] . Neural Computation , 2020 , 32 ( 9 ): 1733 - 1773 .

REJWAN I , MANSOUR Y . Top-k combinatorial bandits with full-bandit feedback [C ] // Proceedings of the 31st International Conference on Algorithmic Learning Theory .[S.l.:s.n. ] , 2020 .

HUANG W R , OK J , LI L , et al . Combinatorial pure exploration with continuous and separable reward functions and its applications [C ] // Proceedings of the 27th International Joint Conference on Artificial Intelligence . New York:ACM Press , 2018 .

DU Y H , KUROKI Y , CHEN W . Combinatorial pure exploration with fullbandit or partial linear feedback [J ] . arXiv preprint,2020,arXiv:2006.07905 .

CHEN W , DU Y H , HUANG L B , et al . Combinatorial pure exploration for dueling bandit [C ] // Proceedings of the 37th International Conference on Machine Learning .[S.l.:s.n. ] , 2020 .

COMBES R , TALEBI M S , PROUTIERE A , et al . Combinatorial bandits revisited [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems . New York:ACM Press , 2015 : 2116 - 2124 .

GARIVIER A , CAPPÉ O ,, . The KL-UCB algorithm for bounded stochastic bandits and beyond [C ] // Proceedings of the 24th Annual Conference on Learning Theory .[S.l.:s.n. ] , 2011 .

CUVELIER T , COMBES R , GOURDIN E . Statistically efficient,polynomial-time algorithms for combinatorial semibandits [J ] . Proceedings of the ACM on Measurement and Analysis of Computing Systems , 2021 , 5 ( 1 ): 1 - 31 .

QIN L J , CHEN S Y , ZHU X Y . Contextual combinatorial bandit and its application on diversified online recommendation [C ] // Proceedings of the 2014 SIAM International Conference on Data Mining . Philadelphia:Society for Industrial and Applied Mathematics , 2014 .

CHEN L X , XU J , LU Z . Contextual combinatorial multi-armed bandits with volatile arms and submodular reward [C ] // Proceedings of the 32nd International Conference on Neural Information Processing Systems . New York:ACM Press , 2018 : 3251 - 3260 .

ZUO J H , LIU X T , JOE-WONG C , , et al . Online competitive influence maximization [J ] . arXiv preprint,2020,arXiv:2006.13411 .

KOMIYAMA J , HONDA J , NAKAGAWA H . Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning . New York:ACM Press , 2015 : 1152 - 1161 .

WANG S W , CHEN W . Thompson sampling for combinatorial semi-bandits [C ] // Proceedings of the 35th International Conference on Machine Learning .[S.l.:s.n. ] , 2018 .

HÜYÜK A , TEKIN C . Analysis of Thompson sampling for combinatorial multi-armed bandit with probabilistically triggered arms [C ] // Proceedings of Machine Learning Research .[S.l.:s.n. ] , 2019 .

HÜYÜK A , TEKIN C . Thompson sampling for combinatorial network optimization in unknown environments [J ] . IEEE/ACM Transactions on Networking , 2020 , 28 ( 6 ): 2836 - 2849 .

WANG S W , CHEN W . Thompson sampling for combinatorial semi-bandits [C ] // Proceedings of the 35th International Conference on Machine Learning .[S.l.:s.n. ] , 2018 : 5114 - 5122 .

ZHOU H Z , WANG L D , VARSHNEY L , et al . A near-optimal change-detection based algorithm for piecewise-stationary combinatorial semi-bandits [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 6933 - 6940 .

CHEN W , WANG L W , ZHAO H Y , et al . Combinatorial semi-bandit in the non stationary environment [J ] . arXiv preprint,2020,arXiv:2002.03580 .

ZHANG X J , LI S , LIU W W . Contextual combinatorial conservative bandits [J ] . arXiv preprint,2019,arXiv:1911.11337 .

CESA-BIANCHI N , LUGOSI G . Combinatorial bandits [J ] . Journal of Computer and System Sciences , 2012 , 78 ( 5 ): 1404 - 1422 .

BUBECK S , CESA-BIANCHI N ,, KAKADE S M . Towards minimax policies for online linear optimization with bandit feedback [J ] . Journal of Machine Learning Research , 2012 ,23.

SAKAUE S , ISHIHATA M , MINATO S I . Practical adversarial combinatorial bandit algorithm via compression of decision sets [J ] . arXiv preprint,2017,arXiv:1707.08300 .

LIN T , ABRAHAO B , KLEINBERG R , et al . Combinatorial partial monitoring game with linear feedback and its applications [C ] // Proceedings of the 31st International Conference on Machine Learning .[S.l.:s.n. ] , 2014 .

CHAUDHURI S , TEWARI A . Phased exploration with greedy exploitation in stochastic combinatorial partial monitoring games [J ] . arXiv preprint,2016,arXiv:1608.06403 .

CHEN X Y , ZHENG K , ZHOU Z X , et al . (Locally) Differentially private combinatorial semi-bandits [C ] // Proceedings of the 37th International Conference on Machine Learning .[S.l.:s.n. ] , 2020 .

CHEN Y W , HOFMANN K.Online learning to rank:absolute vs . relative [C ] // Proceedings of the 24th International Conference on World Wide Web . New York:ACM Press , 2015 .

KVETON B , SZEPESVARI C , WEN Z , et al . Cascading bandits:learning to rank in the cascade model [C ] // Proceedings of the 32nd International Conference on Machine Learning . New York:ACM Press , 2015 : 767 - 776 .

KVETON B , WEN Z , ASHKAN A , et al . Combinatorial cascading bandits [J ] . Advances in Neural Information Processing Systems , 2015 , 28 : 1450 - 1458 .

LI S , WANG B X , ZHANG S Y , et al . Contextual combinatorial cascading bandits [C ] // Proceedings of the 33rd International Conference on International Conference on Machine Learning . New York:ACM Press , 2016 : 1245 - 1253 .

KATARIYA S , KVETON B , SZEPESVÁRI C ,, et al . DCM bandits:learning to rank with multiple clicks [C ] // Proceedings of the 33rd International Conference on International Conference on Machine Learning . New York:ACM Press , 2016 : 1215 - 1224 .

CAO J Y , SUN W , SHEN Z J M , et al . Fatigue-aware bandits for dependent click models [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 3341 - 3348 .

KVETON B , LI C , LATTIMORE T , et al . BubbleRank:safe online learning to rerank via implicit click feedback [C ] // Proceedings of the 35th Uncertainty in Artificial Intelligence Conference .[S.l.:s.n. ] , 2020 .

ZOGHI M , TUNYS T , GHAVAMZADEH M , et al . Online learning to rank in stochastic click models [C ] // Proceedings of the 34th International Conference on Machine Learning .[S.l.:s.n. ] , 2017 .

ZHU Z A , CHEN W Z , MINKA T , et al . A novel click model and its applications to online advertising [C ] // Proceedings of the 3rd ACM International Conference on Web Search and Data Mining . New York:ACM Press , 2010 .

LI S , LATTIMORE T , SZEPESVARI C . Online learning to rank with features [C ] // Proceedings of the 36th International Conference on Machine Learning .[S.l.:s.n. ] , 2019 .

LATTIMORE T , KVETON B , LI S , et al . TopRank:a practical algorithm for online stochastic ranking [C ] // Proceedings of the 32nd International Conference on Neural Information Processing Systems .[S.l.:s.n. ] , 2018 .

YUE Y S , GUESTRIN C . Linear submodular bandits and their application to diversified retrieval [C ] // Proceedings of the 24th International Conference on Neural Information Processing Systems . New York:ACM Press , 2011 : 2483 - 2491 .

YU B S , FANG M , TAO D C . Linear submodular bandits with a knapsack constraint [C ] // Proceedings of the 30th AAAI Conference on Artificial Intelligence . New York:ACM Press , 2016 : 1380 - 1386 .

CHEN L , KRAUSE A , KARBASI A . Interactive submodular bandit [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems .[S.l.:s.n. ] , 2017 .

TAKEMORI S , SATO M , SONODA T , et al . Submodular bandit problem under multiple constraints [C ] // Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence .[S.l.:s.n. ] , 2020 .

孔芳 , 李奇之 , 李帅 . 在线影响力最大化研究综述 [J ] . 计算机科学 , 2020 , 47 ( 5 ): 7 - 13 .

KONG F , LI Q Z , LI S . Survey on online influence maximization [J ] . Computer Science , 2020 , 47 ( 5 ): 7 - 13 .

KEMPE D , KLEINBERG J , TARDOS É , . Maximizing the spread of influence through a social network [C ] // Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining .[S.l.:s.n. ] , 2003 : 137 - 146 .

WEN Z , KVETON B , VALKO M , et al . Online influence maximization under independent cascade model with semibandit feedback [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems .[S.l.:s.n. ] , 2017 : 3026 - 3036 .

WU Q Y , LI Z G , WANG H Z , et al . Factorization bandits for online influence maximization [C ] // Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining . New York:ACM Press , 2019 : 636 - 646 .

VASWANI S , LAKSHMANAN L V S , SCHMIDT M . Influence maximization with bandits [J ] . arXiv preprint,2015,arXiv:1503.00024 .

LI S , KONG F , TANG K J , et al . Online influence maximization under linear threshold model [J ] . arXiv preprint,2020,arXiv:2011.06378 .

VASWANI S , KVETON B , WEN Z , et al . Model-independent online learning for influence maximization [C ] // Proceedings of the 34th International Conference on Machine Learning . New York:ACM Press , 2017 : 3530 - 3539 .

浏览量

841

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据