Traditional fine-tuning of foundation models is computationally heavy, involving updates to billions of parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward r, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function (Q*), which is often unavailable in practice. We propose Transfer Q*, which implicitly estimates the optimal value function for a target reward through a baseline model aligned with a baseline reward rBL (which can be different from the target reward). Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.
Short Bio:
Amrit Singh Bedi is an assistant professor in the Computer Science department at the University of Central Florida, Fl, USA. Before that, He was a research assistant professor in the Computer Science Department at the University of Maryland, College Park, MD, USA. He obtained his Ph.D. in Electrical Engineering from IIT Kanpur, Kanpur, India, in 2018. Following his doctoral studies, he worked as a Research Associate within the Computational and Information Sciences Directorate at the US Army Research Laboratory (ARL) in Adelphi, MD, USA, from 2019 to 2022. His research interests lie in artificial intelligence (AI) for autonomous systems, with specific emphasis on scalable & sample-efficient learning algorithms. Currently, he is working on the problem of AI alignment in language models. His paper was selected as one of the Best Paper Finalists at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers. He received an honorable mention from the IEEE Robotics and Automation Letters in 2020. He was awarded the Amazon Research Award in 2022.