less magic, more infrastructure

My day job is to build automation. Some of my best work is when a person can show their intent with a small effort and automatically marshal hideously complex processes to carry out that intent. I show them the hideous guts of the process once to prove that I’ve done work – a standard wizard tactic to avoid being taken for granted – but after that, it should work like magic.

Or should it? As an individual, I actually dislike magical interfaces. I groan when I read setup documentation, because it always has 3 steps that fail somewhere between step 2 and 3. “Take the device out of the box, place it next to the main device, and it will pair.” Right. And if it doesn’t? (For me, it rarely does.) Then suddenly I’m in 300 more steps that are spread out over a dozen sites, hidden among the worst documentation interfaces possible. I’m pushing the one button on the device in a staccato rhythm while reinstalling the operating system of the other while draping a mylar blanket over both to block stray radiation, and… I realize I’m on the wrong end of the magic.

What I prefer in a case like that is good old fashioned* infrastructure. Plug A into B, tell B that A exists, tell A that B is what you want. Once they’re paired, remove the plug and you’re in the same situation the magic would have left you after step 3. Except! If you run into a problem, you know how to drop into the infrastructure and perform the same set of steps to get you back where  you need to be.

(*It’s not actually old fashioned. We just get used to the infrastructure that works, and it feels like it’s always been there. Infrastructure that doesn’t work is technology, and we get used to it not working and route around it.)

To design infrastructure vs magic, the difference is asking, “what happens when this goes wrong? How can someone using this get to the part that isn’t working and direct it manually?” That’s where the difficult work of engineering comes in, because you need to ask not only how your system works when it all works, but how the whole system it relies on behaves when it doesn’t. What does the process do when there’s no internet? What does it do when the signal from the other device is too weak? What does it do when the list of devices it sees is too long? When the device doesn’t speak the right protocol?

A lot of that design deals with falling back. If the latest protocol doesn’t work, is there an older one that might? If the signal is weak, is there a way to connect that doesn’t use radio? And above all, how do we communicate this to the person looking at it, so they know which part needs help?

So it’s hard work, but really it’s doing the work needed to create full automation. It’s not just automated when it works; that would be magic. Putting me in a place to fix it when it doesn’t work automatically is good infrastructure.