Pessimistic-Optimistic Bandit Learning with Applications to Communication Networks
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Many modern engineering problems in communication networks, and other resource allocation problems, involve optimizing some time-varying quantity with unknown statistics under various constraints with bandit feedback. In this thesis, we study the pessimistic-optimistic approach to solving extensions to constrained bandit learning, motivated by various communication network and other resource allocation applications. In particular, we study the cases where feedback is delayed, the case where there is a cost to switching bandit arms, and the case of learning in queueing networks. Adapting the pessimistic-optimistic approach to each of these extensions presents unique challenges that require novel theoretical techniques.
In the constrained combinatorial sleeping bandit setting with delayed feedback, we develop a tuneable pessimistic-optimistic algorithm, derive novel regret and constraint violation analysis, and support these findings by extensive experiments, including trace-based simulations. We also apply these techniques to the case of learning in a bipartite queueing network, in which the feedback delay and constraints and interconnected, and which requires further theoretical analysis.
In the constrained bandits with switching costs setting, we develop a novel block-based pessimistic-optimistic algorithm, and prove its sublinear regret and vanishing constraint violation. We support our findings with empirical results, including trace-based simulations.
Finally, we study the problem of network utility maximization under unknown network statistics in a multi-hop queueing network, and develop an algorithm that combines pessimistic-optimistic job admissions with backpressure routing and scheduling. We develop novel regret analysis techniques in this setting that make extensive use of queue length properties to prove optimal regret.

