Abstract: We are in the midst of a major data revolution. The total data generated by humans from the dawn of civilization until the turn of the new millennium is now being generated every other day. Driven by a wide range of data-intensive devices and applications, this growth is expected to continue its astonishing march, and fuel the development of new and larger data centers. In order to exploit the low-cost services offered by these resource-rich data centers, application developers are pushing computing and storage away from the end-devices and instead deeper into the data-centers. Hence, the end-users' experience is now dependent on the performance of the algorithms used for data retrieval, and job scheduling within the data-centers. In particular, providing low-latency services are critically important to the end-user experience for a wide variety of applications. Our goal has been to develop the analytical foundations and methodologies to enable cloud storage and computing solutions that result in low-latency services. In this talk, I will focus on our efforts on reducing the latency through load balancing in large-scale data center systems. In our model each arrival is randomly dispatched to one of the servers with queue length below a threshold; if none exists, this arrival is randomly dispatched to one of the entire set of servers. We are interested in the fundamental relationship between the threshold and the delay performance of the system in heavy traffic. To this end, we first establish the following necessary condition to guarantee heavy-traffic delay optimality: the threshold will grow to infinity as the exogenous arrival rate approaches the boundary of the capacity region (i.e., the load intensity approaches one) but the growth rate should be slower than a polynomial function of the mean number of tasks in the system. As a special case of this result, we directly show that the delay performance of the popular pull-based policy Join-Idle-Queue (JIQ) lies strictly between that of any heavy-traffic delay optimal policy and that of random routing. We further show that a sufficient condition for heavy-traffic delay optimality is that the threshold grows logarithmically with the mean number of tasks in the system. This result resolves a generalized version of the 25 year old conjecture by Kelly and Laws.