I'm partitioning a stream of input data between n servers. The simple way of doing it is to use the hash of some property of each input data packet to assign a single server, using mod or similar, and be done with it.
However, I want some degree of resiliancy - if one server goes down, nothing is lost. I want to partition each data packet to m servers, where 1 < m < n, with each data packet guaranteed to go to at least m servers (but it can be more). Furthermore, I want the partitioning to be stateless, deterministic, and well-distributed - the calculation only uses the hash(es) of the input data.
This feels like something that research papers have been written about, but my google-fu has failed me. Are there any existing algorithms which do this, ideally generalisable across n and m?
mod (ni/m), whereniis initial number of servers, is probably well-distributed on a large enough time scale. :) But otherwise it doesn't behave like you want a cluster to behave... adding nodes only adds more redundancy. To increase performance (increase n while keeping m constant), you'd have to rebalance the data between nodes. – Kasey Speakman May 18 '16 at 22:56