In this talk, we discuss the problem of fair sparse regression on a biased dataset where bias depends upon a hidden binary attribute. The presence of a hidden attribute adds an extra layer of complexity to the problem by combining sparse regression and clustering with unknown binary labels. The corresponding optimization problem is combinatorial, but we propose a novel reformulation of it as an invex optimization problem. We show that the inclusion of the debiasing/fairness constraint in our model has no adverse effect on the performance. Rather, it enables the recovery of the hidden attribute. The support of our recovered regression parameter vector matches exactly with the true parameter vector. Moreover, we simultaneously solve the clustering problem by recovering the exact value of the hidden attribute for each sample. Our method uses carefully constructed primal-dual witnesses to provide theoretical guarantees for the combinatorial problem. We show that the sample complexity of our method is logarithmic in terms of the dimension of the regression parameter vector.
Short Bio:
Adarsh Barik is a postdoc at the Institute of Data Science at the National University of Singapore. He did his PhD in Computer Science at Purdue University and his undergrad at IIT Madras before that. His research interests are broadly in theoretical and computational aspects of optimization, machine learning, information theory, and high-dimensional data analysis.