Part of operating a high performance cloud is identifying problems long before they impact your infrastructure. Insight and visibility are critical to manage your most important assets. The difference between the “reluctant hoster” – someone who, by necessity of the business, has built up a few servers under his or her desk – and a managed cloud services provider sometimes boils down to Cloud Engineers who take the time to scrutinize real-time metrics to ensure that network, storage, and hypervisors are all operating with peak availability.
Case in point, one of our customers, a $300M electronics company, recently dropped n+1 memory redundancy in their Private Cloud but never experienced an outage. Hinging on the remaining infrastructure was their reputation and potential revenue. While an urgent replacement was underway, they took advantage of insight available in the HOSTING Customer Portal to trim excess memory allocated to their VMs and get back to a redundant state.
High-performance clouds typically consist of a few key things, visibility being the 1st. Insight into detailed performance data is crucial before you launch so operations of the platform can iterate. You must turn the data you collect into information, knowledge, wisdom, and – most importantly – the ability to act (or remediate). Or don’t bother.
Then, of course, every cloud must have what is fast becoming the most expensive part of operating a cloud environment – storage. Ultra-fast SAN storage solutions like those provided by NetApp give you the ability to poll and aggregate metrics across your storage platform and identify proactively when you need to increase capacity. Going further, NetApp’s OnCommand management software has a feature-rich REST API that you can use for monitoring. HOSTING uses it, and has developed a custom-built dashboard for our Operations team to ensure we’re always in front of capacity, performance, and availability data.
Lastly, a good hypervisor is essential. VMWare has not only added better automation in their vSphere 5.1 API, but continues to improve the visibility into the health of virtual machines and speed of operations. HOSTING’s taken this route for our customers. Across our vCenters, we report on physical-to-virtual ratios for CPU and memory, status of high availability in the cluster, and how often machines are vMotioning within the clusters to balance workload. Beyond hypervisor and storage processor analysis, we collect and monitor thousands of SNMP events from your infrastructure and ours so that we instantly know when we should mitigate an event.
What this boils down to is vigilance. To those already leveraging virtual and/or cloud infrastructure, the HOSTING Customer Portal and HOSTING 360° Report give you peace of mind and actionable analytics. To those still in the physical world, it’s time to move.