How should we design machine learning systems when the underlying environment (e.g. data distribution) changes in response to the deployed model? In the context of supervised learning, the framework of performative prediction provides game-theoretic solution concepts that a learner can optimize in the presence of decision-dependent distributions. In this talk, I will provide an overview of our work to model such “performativity” in the context of reinforcement learning. In particular, I will describe how to reach a stable policy in a setting where the underlying MDP reacts to the deployed policy. I will end with some open questions, and if time permits, some of our recent works on reinforcement learning with human feedback.
Short Bio:
Debmalya Mandal is an assistant professor at the University of Warwick, UK. He completed his PhD from Harvard University and was subsequently a postdoc at Columbia University and Max Planck Institute. He is broadly interested in problems at the interface of machine learning and multi-agent systems.