Skip to the content.

Part 8: Scaling Fusion Services

NOTE: Nearly everything described in this section (and even more) is already implemented. Sections covering CommandR and Operations Framework describe this in details, but it still makes sense to read this section to better understand the problem they solve.

If you want to see it in action, check out http://boardgames.alexyakunin.com

Scaling Fusion services is actually simpler than it may seem at first - and mostly, you should take into account two key factors:

  1. You should treat any Fusion-based service host as one of your caching servers and scale it accordingly - but most importantly you should ensure these servers share a limited subset of data. The more data they share, the lower is cache hit ratio (assuming the amount of RAM is fixed).

  2. You need to ensure that invalidations from any of such servers are “replicated” on any other server that may serve the same piece of data.

Let’s look at a few specific cases to understand this better.

Scaling Multi-Tenant Services

Load Balancing

This is the simplest case. Let’s assume we’re building a Fusion-based multi-tenant service, and:

The simplest way to achieve a desirable distribution of load in this case is to use Rendezvous Hashing or Consistent Hashing. Almost any industry standard load balancer supports the later one - in particular, you can use:

The main downside of these two options is that they support only K == 1 scenario. Both directives allow you to bind a tenant to a single backend server only, so in case this server goes down, users from tenants hosted there are going to experience a slow-down, because none of other servers were serving these tenants, and thus they don’t have any of their data cached.

A code below shows how to use Rendezvous Hashing to implement more efficient mapping of users to backend servers and support K > 1 scenario:

Host GetHost(string tenantId, string userIdOrIP) 
    => Hosts
        .Select(host => (
            Host: host, 
            Weight: Hash(host.Id, tenantId)
        ))
        .OrderBy(p.Weight)
        .Select(p => p.Host)
        .Skip(Hash(userIdOrIP) % K)
        .First();

As you see, in this case we select a “stable” set of K hosts for every tenant and route a specific user to (Hash(userIdOrIP) % K)-th host in this set. Once one of hosts goes down, its load will be picked up by K - 1 hosts from the same set plus one extra host, so the % of users experiencing slowdown in this case (1 / K) could be reduced to any desirable number at cost of extra RAM.

Above code is pretty inefficient - its time complexity is O(N*log(N)), but notice that while your set of hosts is stable, you can cache the following list per each tenant to reduce the complexity to O(1):

Host[] GetTenantHosts(string tenantId) 
    => Hosts
        .Select(host => (
            Host: host, 
            Weight: Hash(host.Id, tenantId)
        ))
        .OrderBy(p.Weight)
        .Select(p => p.Host)
        .Take(K)
        .ToArray();

In practice, such load balancing can be implemented by having this logic on your own proxy - and you can use e.g. AspNetCore.Proxy or YARP to implement it.

And if you prefer safer, but maybe a bit less flexible option, almost any industry standard load balancer supports consistent hash-based mapping too. In particular, you can use:

A few important things to keep in mind:

Distributed Invalidation

The problem:

Assuming hs = GetTenantHosts(tenantId) is the set of hosts serving the data of a specific tenant, invalidation happening on any of these hosts should be reproduced on every other host in the same set.

Note that skipping a single invalidation in Fusion’s case is actually a big problem: invalidation is the only way to indicate the data is stale / obsolete, and unless it’s triggered, there is always a chance the old data could be reused, because all you need to “enforce” this is to somehow keep the reference to underlying IComputed.

In other words, invalidations must be quite reliable, if you don’t want to resort to hacks like periodically scanning every IComputed in Fusion’s ComputedRegistry and e.g. invalidating the “oldest” ones.

For now Fusion doesn’t offer abstractions helping to implement distributed invalidation (this is certianly temporary), but the problem to solve here is a typical application of publish-subscribe pattern, so you can use a service like Azure Service Bus and have a single “invalidation topic” per tenant to deliver these messages to every host that serves its data.

Now, some tricky aspects:

All of this means that if your system is distributed and transactional, you need to use a bit more complex protocol to replicate invalidations reliably:

Inside transaction:

Once transaction committed:

And we need two extra services:

  1. Invalidation service listens for new operation messages in topics matching every tenant it serves and runs corresponding invalidations (except for the messages originating from the same host). This service should start before your host starts to process any read requests.
  2. “Recovery pump” service similarly listens for operation messages and removes corresponding operation entries from tenant’s DB. But in addition to that, it also looks for operation entries in tenant’s DB that weren’t removed for a while, and once it sees such an entry, it sets its creation time in DB to the current one and pumps back the identical operation to the matching pub/sub topic. “A while” here should be long enough to ensure a very high chance of message propagation through the pub/sub pipeline - e.g. you may set it to 99.9999 percentile of message propagation time.

As you might guess, the “recovery pump” service might be a part of invalidation service - all you need is to ensure that if such service runs on every host, they don’t race with each other and don’t overload the DB with identical “delete operation” requests (+ batching these requests is a good idea anyway).

P.S. If you know about Event Sourcing, you could instantly spot this pattern perfectly fits to implement all of this.

Scaling Single-Tenant Services

Overall, you can’t horizontally scale a service that doesn’t allow some kind of partitioning. This isn’t a limitation imposed by Fusion - it’s just what horizontal scaling implies.

The process of scaling such a service is actually quite similar to scaling the monolith:

Identifying partitioning dimensions is the most interesting part here. Ideally, you want to ensure that a single partition can be:

These two criteria ensure you can horizontally scale both the data and compute capacity without a need to repartition existing partitions - in other words, this is what allows you to view your partitions as “tenants”.

Scaling Reads via Replica Hosts

Fusion brings a pretty unique way of scaling read workloads: since Compute Service Clients are almost identical to the original Compute Services including the way they cache the outputs, you can re-expose them as the next layer of hosts (let’s call them “replica hosts”) to scale read workload.

The downsides of such an approach are straightforward:

As for the upsides:

Alleviation of instant fluxes in traffic to certain content is a good example of when this could be useful. Imagine a scenario when a post of a regular user gets shared by an influencer and gains a lot of traction. If you track partitions that are on the verge of their compute or IO capacity, you can almost instantly re-route the traffic hitting them to a dedicated pool of “replica hosts” to get 10x read capacity almost instantly. And once the load is gone, you can remap them back to the original pool.

Large Working Sets and GC Pauses

The problem:

Max. duration of STW (stop-the-world) pauses caused by GC on .NET Core is proportional to the size of working set, which means you probably need to limit it by 32-64 GB per host.

For the sake of clarity:

There is no silver bullet resolving this issue completely, but there are plenty of workarounds you can use to nearly eliminate it:

Decrease the number of objects in heap - by:

Limit the size of your working set to run N processes per host - by:

Making pauses less visible:

Wait, but what about invalidations and updates?

Interestingly, one of the most frequent question about Fusion is:

Wait, but how can it scale if it recomputes every output once any of its dependencies changes?

And the answer is actually simpler than it seems:

  1. Fusion doesn’t recompute anything once something changes. It just invalidates every dependency of what’s changed. But could invalidation alone be costly enough? No:
  2. Using a dependency (calling a function + creating a single dependency link) requires O(1) time at best (i.e. if its output is already cached). And that’s also a minimum amount of time you spend to call a function if there would be no Fusion at all, which means that dependency tracking is ~ free.
  3. Processing a single invalidation link during the invalidation pass requires O(1) time too, and this happens just once for every link. In other words, invalidations are, basically, free as well!

This is why Fusion services should scale at least as well as similar services w/o Fusion. “At least” here means that Fusion certainly makes you to pay a fixed, but much higher cost per every call to provide automatic dependency tracking, caching, etc., plus you should take into account such factors as the amount of RAM your new service will need with a given caching options, and so on. In other words, of course there are details you need to factor in to use it efficiently.

And if you look at its Compute Service Clients, you’ll quickly conclude all the same statements are equally applicable to them as well - the only difference is that this O(1) cost can have a much higher (but still fixed) absolute value there, because every computation and invalidation requires an extra network roundtrip there.

Finally, notice that IState – an abstraction that powers most of UI updates – uses IUpdateDelayer, which, in fact, controls the max. possible update rate, and you can change its settings at any time. So:

  1. You have all the levers to control the frequency of such updates, and in particular, you can throttle them down on any popular piece of content or when your service experiences high load.
  2. “Update” rarely triggers actual recomputation - it triggers the recomputation only when it’s the first update request after some change, otherwise it just delivers the cached value.
  3. And finally, note that recomputations are incremental with Fusion - as with incremental builds, you rarely recompute anything from scratch with Fusion. You recompute just what’s changed.

Of course, this isn’t a complete set of options you have - e.g. you can also trade consistency for performance by delaying invalidations. But the main point is: yes, Fusion-based services scale.

Part 9: CommandR - Intro » | Tutorial Home