Home Blog CV Projects Patterns Notes Book Colophon Search

Admin (Experimental)

I've used lots of provisioning tools in the past including Ansible and Salt. I've never liked any of them and I think the reason always boils down to this:

Idempotent actions sound like a good idea, but in reality as deployments get more complicated, having the same command do different things based on the state of the system ends up leading to completely indeterminate behaviour that is impossible to test.

To spell out what that means, and what the problem is, consider a provisioning tool that takes a username, say jimmygdb and installs a PostgreSQLl instance into that user's home directory.

You might pass the following configuration to the provisioning too:

{'user': 'jimmygdb', 'dbport': 5342}

Then run:

create user user=$user
create dbserver user=$user port=$dbport

The first time you run this the user is created, the database files are put in the correct place in the home directory and the Postgres server is started.

Now imagine you run this again. In an idempotent system the provisioning tool tries its best to apply the state specified in the config directory to the running system.

It will see the user already exists and not create it again. But because it hasn't just created it, the home directory isn't empty. In order for the tool to run the create dbserver command it has to know that there is a possibility that there will be a load of files in the directory already and it has to decide whether or not to delete them. The create user and create dbserver commands are now coupled and depend on their interaction, not just themselves.

Now imagine a more realistic deployment with 20 or 30 commands. Using the formula:

                        n(n-1)
number of connections = ------
                           2

we see that there could be 190 - 435 interactions, and that's assuming components only interact in one possible way, which they probably don't.

Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
>>> def interactions(number):
...     return (number * (number-1))/2
... 
>>> interactions(20)
190
>>> interactions(30)
435

It quickly becomes impossible to claim that a system can be reliably idempotent with respect to running these commands. How will component 12 behave when component 15 is not running but component 10, which had an empty directory when 12 was installed, has now filled the disk with log files?

Effectively, the perfectly laudable goal of idempotency, in practice leads to non-deterministic behaviour.

So, what would I do differently? Would I reach for the latest and greatest container solution? Well, yes, that is a perfectly reasonable thing to do to solve this particular problem, but for my needs it is a bit heavyweight.

Instead I like an approach where none of my commands are idempotent. Either they all work, or the whole install fails. The commands aren't designed to be run again, and if you choose to run one commands out of order, that's because you understand what the command does and believe running it is a safe thing to do, not just because you are crossing your fingers.

The re-build from scratch approach works well if you can re-build from scratch in just a few 10s of seconds.

What makes this much more realistic is if if you pre-compile and pre-build all the software you know your platform needs and have the first step of your deployment being to unzip that software into the right place ready to configure and run.

Although we often like to think otherwise, computers do behave entirely predictably - if nothing changes, nothing can ever go wrong. The challenge is therefore to make sure as little changes as possible. If you can always predictably rebuild to a known state, that's a solid foundation to start from.

So, with all that background in mind, I quite like using Fabric. Unfortunately it is Python 2.7 only, and I wanted to use Python 3 only in my stack going forward.

The solution is to take the core idea behind Fabric - namely being able to run tasks either over SSH or locally, and apply it to a new, more stripped down library that also had functions for the sort of tasks I want to run for my provisioning.

The result is the admin library which is used for building and provisioning all the software on jimmyg.org.

Tags: Provisioning, Linux, Python

A tool a bit like a slimmed down version of Fab, but for Python 3, and specifically for deploying Django-based web stacks with monitoring.

Copyright James Gardner 1996-2020 All Rights Reserved. Admin.