Abstract:In a content delivery network, data are distributed to geographically separated servers or data centers. This helps in distributing the load in the network and in serving the neighborhood locally. However, an arbitrary data placement scheme may lead to excessive operational cost for the service provider. We study the problem of content distribution and delivery as a cost optimization problem.
The major decisions influencing the cost involve the location of servers, allocation of content to servers, assignment of users to servers, the network topology and routing choices. Clearly, these decisions also appear in a transportation network design. We use the concepts of hub-spoke networks, widely used in the airlines and telecommunications industry, to model these problems in an integrated framework. We introduce a two-phased solution approach, in which the first phase identifies a set of disjoint parts of the databases (called database segments) by using ideas from collaborative filtering. In the second phase, the optimal placement of these segments in a network is studied in a mathematical programming framework. Due to the presence of multiple problems, some of which are already known to be NP-hard, the resulting formulation is extremely complex and the optimal solutions are available only for small instances. Therefore, we focus on the development of large scale optimization techniques (such as Benders decomposition) to make the model applicable in practice.