Publication# On Optimal Update Policies and Cluster Sizes for 2-Tier Distributed Systems

Abstract

We try to analyze a generic model for 2-tier distributed systems, exploring the possibility of optimal cluster sizes from an information management perspective, such that the overall cost for updating and searching information may be minimized by adopting a judiciously lazy updating policy. We do not assume either centralized coordination or decentralization, and since it is an initial work, we only advocate the existence of such optimal policies rather than how such policies may be discovered by the system participants. We put our work in perspective using two examples from diverse domains of distributed systems, namely the wireless cellular networks, which are based on centralized coordination and peer-to-peer systems using clusters (like Kazaa).

Official source

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.

