The life of an app

A guy on a board I frequent asked:

Probably a stupid question, but what's the difference between development, testing, staging and production servers?

I had some time while I was eating my salad at my desk, and penned a reply:

You have code, code which works and is in production. Now you get a bug report, have to add a feature, whatever. So you start a branch of the production code on the dev machine, and get busy writing.

When you want to make sure that code does what it's intended to do, doesn't affect any other parts of the app, etc, you move it from the dev machine to the testing machine. There you poke in whatever ways are necessary or mandated by company policy. QA and devs also get a chance to poke at stuff here.

Once the code is verifiably working (and doing no other harm), it is frozen (no more enhancements, bugs fixed, etc) and it is moved from testing to staging. This is where you invite a few end users (the rest of your group, including some no-tech folks most probably) to test drive the changes. Stuff in staging is basically "in beta". URLs are fairly stable, the apps don't go up and down because you're restarting things all the time like on dev or testing, etc. Depends on where you work, but another security review might also happen here. You might also have to bring in a release engineer or configuration management guy at some point during this stage.

Once you know everything is 100% ready to go, nothing is going to break, people love the new features, everyone has signed off on the release, you move from staging to production. You don't forget to let your admins know that the app is going to go down for a minute, lest they get paged. And because you care, you have looked at traffic patterns and are doing the production integration at your absolute off-peak, to minimize impact on end users. Thankfully, you were mentored by the King Of CYA, and have a rollback plan should it be necessary to "downgrade" to the last rev of your production app. The code it integrated, QA does their smoke tests, and if it all works, you ask the admins to keep an eye on it and then you go home and drink it off.

That's a perfect world. What actually happens is that some bozo has shit running out of his home directory and because it's "in production" you need heavy duty earth movers to get it on a real server since mgmt doesn't realize that it's flaky. All they know is that "it's working great now" and they can't risk any downtime. No amount of technically correct reasoning can convince the VP of sales that it's just fine to move his customer-facing app. He'll veto your arguments by saying to the CEO, "Bob, you know I trust the IT guys, but I just don't see how we can risk losing a customer because the app is down..."

The guy who built it quits, and stuff breaks left and right, because an undocumented "feature" was that the guy copied files around every morning to keep things working. Or his /home/luser directory was archived and deleted when HR terminated his employment. So now you have a bunch of guys copying shit off tape, only to find that it's not all there, permissions are whacked, there were other necessary files in /usr/local that the app needed, whatever.

Or maybe the guy does like in the story and runs crap off his workstation. He doesn't bother using source control, and instead just uses a very intuitive sequential numbering system. Or the final executable is happily named "app-working.exe" so that everyone knows it's the good one that should be in production. He could also just append dates to the app name. That's really helpful, since the last rev is always the production copy.

The best part about the above scenario is that the guy's desktop box will wind up living racked up sideways in a datacenter, in a cabinet nicknamed "the graveyard" by the NOC staff. Nobody knows how to restart it should power get shut off, so the admins taped the top of a water bottle over the power button and put a note on it. It'll be known as "The Dell Desktop Machine You Don't Ever Touch" and folks will be more than happy to pretend it doesn't exist (and that it's lesser-quality desktop power supply stays running, those little fans on the motherboard chips don't get clogged with lint, etc).

Once the "Little Workstation That Could" does go down (and it will, believe me, it will), there will be no less than 8 admins -- some of whom are very senior -- who will spend around 6 hours to bring it back up correctly and test it very roughly. The total cost in man-hours and downtime work out to roughly 1/35th what it would have cost to move it to a real server environment and gin up a little documentation and redundancy. But because they couldn't risk the downtime, you recall, they never did that.

So once it's "fixed", the senior IT guy there shoots off angry emails calling the VP of Sales an ignorant twit, and he wants that desktop shit outta his datacenter but pronto, thank you very much. So a committee is formed and all sorts of buy-in gathered, opinions solicited, outcomes predicted, tasks delegated.

Months later, nobody has touched the thing because anyone with a clue knows that the box has cooties and they don't want the blame for causing outages (Sales VP has a temper AND plays golf with the HR manager and CFO, natch). Besides, the guy who got the duty of migrating the app was the most junior, and the machine was running an OS he's not familiar with the app written in a language he didn't know. He muddled around for a few weeks trying his best to make progress and "show status" before moving on to another job that's actually mentally fulfilling. So everyone's forgotten about the little box (the NOC guys more so than anyone pretend there's empty where it's racked) and so that's why nothing ever got done.

Then it goes down again...

And that's my story of how IT works. Like it?

Posted by wee on 01/29/2009 at 11:28 AM | Main Page | Category: Geek Stuff
Cool old scans

There's all manner of cool stuff at Plan 59.

Posted by wee on 01/16/2009 at 09:46 AM | Main Page | Category: Random Stuff
If I could invent anything...

...it would be a device that allows me to travel through a normal phone line and slap someone silly.

I just got off the phone with a guy who said, "Well, linux doesn't handle memory well... it's not even really an operating system because it swaps all the time". Of course, he pronounced the word linux "LEYE-nucks", which merely adds to the WTFness of what he said.

I've stopped job interviews short because people have mispronounced the name. If you've worked with it in the slightest, you have to know how to say it since a conversation about the system is very likely unavoidable. So it's either conscious effort or ignorance that would cause you to say the name wrong, which is why I'd bounce people who get it wrong. It's evidence that they don't know (or refuse to learn) what they are talking about.

It's probably no surprise that I don't the guy's statement in very high regard.

Posted by wee on 01/12/2009 at 01:56 PM | Main Page | Category: Rants
As much tiki as you can shake a stick at

I came across a tiki site that has links to just about every tiki anything. Cool stuff!

Posted by wee on 01/06/2009 at 02:31 PM | Main Page | Category: Random Stuff