Pessimistic-Optimistic Bandit Learning with Applications to Communication Networks

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Many modern engineering problems in communication networks, and other resource allocation problems, involve optimizing some time-varying quantity with unknown statistics under various constraints with bandit feedback. In this thesis, we study the pessimistic-optimistic approach to solving extensions to constrained bandit learning, motivated by various communication network and other resource allocation applications. In particular, we study the cases where feedback is delayed, the case where there is a cost to switching bandit arms, and the case of learning in queueing networks. Adapting the pessimistic-optimistic approach to each of these extensions presents unique challenges that require novel theoretical techniques.

In the constrained combinatorial sleeping bandit setting with delayed feedback, we develop a tuneable pessimistic-optimistic algorithm, derive novel regret and constraint violation analysis, and support these findings by extensive experiments, including trace-based simulations. We also apply these techniques to the case of learning in a bipartite queueing network, in which the feedback delay and constraints and interconnected, and which requires further theoretical analysis.

In the constrained bandits with switching costs setting, we develop a novel block-based pessimistic-optimistic algorithm, and prove its sublinear regret and vanishing constraint violation. We support our findings with empirical results, including trace-based simulations.

Finally, we study the problem of network utility maximization under unknown network statistics in a multi-hop queueing network, and develop an algorithm that combines pessimistic-optimistic job admissions with backpressure routing and scheduling. We develop novel regret analysis techniques in this setting that make extensive use of queue length properties to prove optimal regret.

Description

Keywords

Multi-Armed Bandits, Wireless Networks, Queueing Theory, Communication Networks, Online Learning, Constrained Learning, Delayed Feedback, Switching Costs

Citation

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-ShareAlike 4.0 International