I was listening to coder radio this week, and they were reviewing the top 7 reasons why engineers leave companies (techrepublic.com):
- The position lacked opportunities for growth and development (49%)
- Leadership/management were bad (47%)
- The company used inefficient or obsolete technologies (31%)
- I was offered higher compensation elsewhere (21%)
- I wanted more freedom to work remotely (19%)
- There were too many stagnant projects (18%)
- I was offered better benefits elsewhere (16%)
So I understand all those reasons. Every place I’ve ever worked at had at least 3 of those factors in play.
The thing that got me thinking was the one about ‘inefficient or obsolete’ technologies. While I completely 100% see why this is a problem, in my current role I am finding more and more that I’m on the opposite side of that argument. More accurately, I find myself being pulled in both directions.
If I had to list out the tech that we use, it would be a very long list. In the last 2 years, we’ve added kafka, postgres, kong and a couple of others into the mix in production i.e. if they fall over at 3am, somebody (most likely me) will get a phone call.
Often folks walk up to my desk and ask ‘why can’t we use X technology’. My answer is usually curt because I’ve had to pry my attention away from what ever I was doing in order to answer this question. ‘Who’s going to look after it in production’? is generally what’s barked back, because the last thing that I want is another reason for my phone to ring at night.
But for the sake of the next poor schmo walking up to my desk with another new shiny technology on the tip of their tongues, let’s unpack that statement: Who’s going to look after it in production? And for the sake of the argument, let’s pick a case study: Redis.
‘Ray, why can’t we use Redis as a big distributed cache?’ It’s a great idea, and has the potential to be elegant. It addresses a problem that is kind of built into microservices, in that every service needs mostly the same data, and it generally needs it yesterday. Also, enterprise wide rules or policies about cache behaviours would be great as it stops service authors implementing ‘caches’ in their own special way. So the potential for some level of standardisation, capacity growth enabled by horizontal scaling, reduction in memory footprint at the service level – all massive. All good.
There’s a but coming – a big one.
But: assuming we get past Proof-Of-Concept (POC) stage, we’re going to have to look at how it runs on a machine other than the developer’s PC, then in some sort of user facing environment, and ultimately on Production. The fictional Redis proponent will, at this point, get defensive now because I’m saying ‘The P Word’. Your beautiful new idea is only at the embryonic stage. ‘It’s still far away from Production!’ you’ll say.
That’s all very well and good, but POC’s have a habit of working themselves into the product, and then at some point, another person will interrupt me my desk with an urgent request to get Redis into Production (with a capital P this time). There will be stress, and shouting until usually someone says ‘JFDI’.
I’ve watched that movie before. Like ‘Atonement’ and anything else written by Ian McEwan, the ending leaves you feeling empty and disappointed. So the time to ask difficult questions is now, while the idea unformed and malleable :
-
What resources do you need? Redis is pretty memory heavy. This is problematic because most of our application is memory heavy. Do we have the machines/hypervisors for this? Do we need to buy more? We don’t need details but we do need a ballpark to know if this will fit into the current infrastructure capacity. What’s the most you’ll need? What’s the least you’ll need?
-
Are you spooling to disk? Is the regular NAS storage good enough? Will there be enough?
-
Do you need to run backups? Maybe as a cache, you don’t need to back it up, but potentially you need to rebuild it’s contents somehow? Is that slow? Does it happen organically? Will things break if the cache isn’t populated?
-
What do we do in a remote DC? If all the important content of this cache is in memory, what happens when we invoke our DR strategy?
-
How do we install it in production? I’m not really interested in how you did it for your POC. In production we’ll use ansible or some other mechanism that makes the process repeatable and reliable. Where does the package come from? Is it he OS package repo? Is there third party repo? Is it something else entirely?
-
How does the second installation work? That could be a minor upgrade or even a restart/reconfigure. Can you do it in a rolling fashion or does this thing need downtime to make even small changes. How tolerant is the cluster to version differences?
-
How do you operate this thing? How do you start it? How do you stop it? Does it need hours of rebalancing after a restart? Are there any manual interactions that have to happen when a back online after being offline?
-
Is there a GUI? How are you going to maintain it? How are you going to hand off the maintenance tasks to the production operations team?
- How do you monitor this thing? And not just ‘is it up’, although that’s an important starting point.
- How do you tell if it’s up and working? Does it support the idea of ‘health’?
- Are there any special things to look for when considering the health of the whole cluster (e.g. assert only one controller in a cluster of 7 instances)
- Is it running slowly? Is it about to break? Do we understand the product well enough to predict when those two things will happen?
-
Is there any commercial support for Redis? Can we shortcut some of these questions by bringing a someone to do training?
- How do you secure it? Can anyone access it? How do you enforce at least an audit trail of how data got in and how it got out?
Now, this is the thing that REALLY irritates me: I actually like new technology. I think it’s great. If you’re not moving forward in the tech game, then you’re moving backwards. You have to work to stand still. The thing that kills me is that I never get to play with this stuff while it’s not stressful. I’m not the one doing a spike on my machine for months. I get handed stress when the developer or whoever it is has finished ‘playing’, and it’s time to act like a grown up. For the record, I hate being a grown up!
Now, it’s not all bad news. A lot of this anxiety can be alleviated, quite easily in fact:
- Have answers to the questions above. You have to have answers. It’s not production ready until you have ALL the answers.
- Start a technology specialists group. You don’t need T-Shirts and a song, but a group email address is probably a good start. This way the group of people who can answer questions about Redis is transparent and easily contactable.
- Own it. The chances of success are a lot higher when the trust for the technology is underwritten by trust in you.
A new technology is for life (or at least good chunk of it), not just for Christmas. It’s a gift that keeps on giving for many years to come, and you have to be ok with that.