Philipp Haller, Jocelyn Boullier
The most successful systems for “big data” processing have all adopted functional APIs. We present a new programming model we call function passing designed to provide a more principled substrate on which to build data-centric distributed systems. A key idea is to build up a persistent functional data structure representing transformations on distributed immutable data by passing well-typed serializable functions over the wire and applying them to this distributed data. Thus, the function passing model can be thought of as a persistent functional data structure that is distributed, where transformations to data are stored in its nodes rather than the distributed data itself. The model simplifies failure recovery by design–data is recovered by replaying function applications atop immutable data loaded from stable storage. Deferred evaluation is also central to our model; by incorporating deferred evaluation into our design only at the point of initiating network communication, the function passing model remains easy to reason about while remaining efficient in time and memory. We formalize our programming model in the form of a small-step operational semantics which includes a semantics of functional fault recovery, and we provide an open-source implementation of our model in and for the Scala programming language, along with a case study of several example frameworks and end-user programs written atop of this model.
Assoc Computing Machinery2016