At HOSTING, certain projects resonate as best practices. These real-life scenarios reinforce methodology and approaches to develop the best possible products. Healthcare.gov will likely be such a project — a reminder of what not to do.
In this two-part series, Parker Snyder of Migration Services at HOSTING draws upon prior lessons learned and best practices for mission-critical business applications, as it pertains to his thoughts on healthcare.gov.
In 2005 the Demonstration of Autonomous Rendezvous Technologies (DART) system was sent to service the MUBLCOM 3 satellite. A mere 11 hours into the flight, the satellite reported that the available propellant had been exhausted. This rendered the DART satellite inert and it eventually collided with the MUBLCOM 3 satellite. NASA declared a “Type A” mishap and launched an investigation. In the end, a configuration setting on the DART satellite’s trusted set of source data ultimately led to a continually diverging set of erroneous numbers. This caused the guidance failure and collision of the satellite with its target – the MUBLCOM 3.
Through every troubleshooting scenario, I harken back to the MUBLCOM 3 incident because of the exhaustive amount of testing and analysis that was executed to determine the source of the incident. With extremely complicated systems it is somewhat of a gambler’s fallacy that tends to lead us into the trap of expecting results based on synthetic testing and appraisal alone. As for healthcare.gov, I believe there were several things that could have been done better to successfully prepare.
Always bear in mind the eventual scope of your project. At face value you are creating something new — a product or design that has not yet been used by customers. Remember to start from a solid base of simplicity, which is the true key to vertical or horizontal scalability. Had the contractors for healthcare.gov really understood the likelihood of every user access and every possible dataset, they might have planned differently for millions of visitors and high volumes of form submissions and activities occurring all at once.
Design a system, throw it away, and then make the system correctly. In the book The Mythical Man-Month, Fred Brooks clarifies the idea of “The second-system effect:” we must be vigilant in our inherent tendency to incorporate all of the features of the first system into its second iteration. This ultimately leads to feature sets that cannot be realistically implemented, which in turn leads to Brooks’ often stated “tendency towards irreducible number of errors” idea. In creating a large scale system, simplistically design the product or system first. Then realize what features are truly problematic, take the entire solution you just designed and throw it out.
This may seem counter intuitive, but it’s the mental equivalent of sleeping on it. Personally, this is the most difficult mantra I grapple with. After all, how can I be wrong the first time? In reality, I often find that my second solution is more simplistic, resilient and can be scaled without the complexities I created in the first round.
Finally, humans are a pesky bunch and tend to not only improperly utilize a system, but also work intentionally (albeit subconsciously) to find where and how to “break” a system. Humans have an inherent drive to break what works out of curiosity. It reminds me of the Galaga arcade player which discovered that executing a sweep of the last two enemies without destroying them for a sufficient period would result in the enemies not firing for the duration of the game.
Now that the stage has been set for what Parker felt was wrong, stay tuned for his thoughts on how it can be done right tomorrow.