Thursday, January 14, 2010

Operations Anti-Patterns

Anti-patterns, as defined by Wikipedia, are "design patterns that may be commonly used but is ineffective and/or counterproductive in practice." Most commonly used when discussing the topic of software engineering, it can be applied in other areas as well.

In an attempt to come up with ways to evolve some of the tasks we practice as every day sysadmins, I thought it might be good to start by defining some Operations anti-patterns.

(Obviously this is nowhere near a comprehensive list. I hope to post more soon.)
  1. Information overload: When an admin creates a cronjob to perform an automated task but doesn't take steps to ensure unnecessary output is discarded.
  2. The rat's nest: One could link a sysadmin's tidiness in the cage and the way they maintains their systems, both software and hardware. The rat's nest almost guaranteed that other aspects of their work are as disorderly, thus causing lost productivity (or in some cases, extending downtime).
  3. Set it and forget it: Setting up a new piece of software or hardware without proper documentation on both its implementation (how and why it exists) and operational (maintenance and support) aspects.
  4. Non-communicado: Ancillary to "set it and forget it", this happens when an admin sets up new system monitors without telling the rest of their team. It could also refer to cron jobs without comments describing them, etc.
Just as with software development, evidence of these anti-patterns do not necessarily characterize a bad sysadmin. Good sysadmins display them as well. But the best, in my opinion, strive to overcome them in their daily work.

    6 comments:

    1. Complementing anti-patterns:

      Information underload:
      When an automated task does not provide enough output to determine it is being completed as expected, resulting in failure to establish an error condition (which is usually found past the last possible responsible moment)

      ReplyDelete
    2. Demoralizing false-positives:

      When new system monitors have an inconsistent fault-determination criteria, the resulting false positive notifications have a demoralizing effect on the team in charge of incident response. A likely consequence: notifications will be increasingly ignored, and eventually a real error condition goes unattended.

      ReplyDelete
    3. Regarding the rat's nest, it can be quite the liability to both the employee and the company, in terms of safety.

      ReplyDelete
    4. I posted several here: http://dev2ops.org/blog/2010/2/18/deployment-management-design-patterns-for-devops.html and have since started a catalog if you're interested in collaborating: http://code.google.com/p/devops-toolchain/wiki/PatternRepository

      ReplyDelete
    5. Nice! I'm checking it out now.

      ReplyDelete
    6. Unfortunately, suggested solutions won't really address the issue.

      ReplyDelete