So, in the last post, I talked about the hardware savings virtualization could bring you. Saving money on electricity, cooling, servers - virtualization pays for itself quickly in most cramped datacenter environments. Monetary and physical savings however, are not the only benefits to moving to a virtualized environment.
Another huge benefit is the abstraction you can now build into your system designs. No more worrying about drivers for that server hardware for Linux or Netware or NT 4 - as long as it runs in a VM, and you can run VMWare on the hardware through Windows, Linux or ESX Server, you can run nearly any operating system on it. The benefits for a Windows-based server running on a VM are tremendous - upgrading server hardware is no longer a serious hurdle, simply copy the VM onto the new machine, turn it on, and you’re off and running on your new box. With ‘Virtual Infrastructure’ (VMotion and ESX server), you don’t even have to turn it off before you move it. You can perform in place hardware upgrades, without downtime. Your Five 9’s of availability just became increasingly easy to achieve, while still maintaining performance and equipment warranty status. Couple this platform abstraction with a properly designed SAN (which yields storage abstraction), and your infrastructure becomes incredibly moldable, able to be configured exactly how it needs to be to get the job done.
Speaking of configuration - when was the last time a patch or service pack install, or a ‘new release’ upgrade totally destroyed a system? Who’s never had a box come back up properly after a kernel patch or other significant update - no matter how much testing was done beforehand? An extremely useful feature available to VM users is the ability to take a nearly instant, point-in-time snapshot of the running system. RAM state, disk, etc. The whole system state is saved in seconds. So, apply your service pack, reboot, and wait for it to break. If you lost data, or the box won’t reboot properly - simply rollback to the snapshot! Instant, bare-metal restore - no loading of tapes or preinstalling the OS to restore from backup. Just a working system, in the state it was minutes or hours ago…
Snap-shotting is incredibly useful on workstations as well - the ability to test software deployment and images many times, without having to sit through a re-image or reinstall of an operating system saves thousands of hours in most technical support departments. It makes developing your own software much easier as well - developers can have VM images of the target server platform, and a copy each of the target client platforms (Win2000, WinXP, XP SP2, Vista, etc), instead of a lab of hot, noisy machines. One beefy desktop replaces 5-6 machines, or more, depending on the application being developed.
Deploying services and servers quickly starts being a non-issue, as well as the ability to break applications up into their own dedicated servers. Have you ever tried to patch one application, and had its dependency updates break another application on the same server (say, moving to PHP5 or MySQL5)? Wouldn’t it be great if every app could have its own dedicated server? It used to be prohibitively expensive, but now it doesn’t make sense NOT to split them up - each app gets it’s own VM. Designing this way also allows for substantially greater flexibility in scaling your system out - if the DB VM becomes overloaded, simply migrated it to faster hardware - no need for complicated reinstalls and data restores, or worse, figuring out what will break when moving the DB off the localhost!
June 19th, 2006
I’ve been bitten.
I think snapshots were what did it, but I’m not sure.
I just know that within the past few months, I’ve seriously begun pushing to move *all* of my systems to VMWare. Virtualize the whole 9 yards - databases, app servers, directory servers, VoIP servers, everything. This is going to sound almost like a VMWare advertisement, but I’ll try to focus on virtualization in a generic sense. It almost sells itself, if you take the time to evaluate it. This is part 1 of probably a 3 part series.
Based on a study from IBM (wish I had a link…), the average server cpu utilization is approximately 6%. In the past 2-3 years, server hardware has really grown fast enough that’s it is mostly sitting idle. In our datacenter, we have 120-130 servers (depending on the day, which projects are being tested, etc). Probably 80 of them are at 0% load, using a few megs of RAM and a couple gigs of disk space. The others run the gamut, from a few scheduling servers (100% load SQL2k and Oracle servers) to heavy-usage directory servers (40-70% load depending on task).
The Hardware Savings Case for Virtualization
Every year, we have to upgrade a chunk of the 80, simply due to unsupported hardware (You’ve got stay in warrenty or stock spares - just because it’s low utilization doesnt mean its not important.). A moderate, low-end Dell rackmount server runs about $3000 list price (PE850 / SC1425) with onsite service. Even these low end servers are dual core, multi-gigahertz machines, which are easily capable of running 15-20 of our applications by themselves. Unfortunately, different applications on our 80 0% servers require (sometimes radically) different configurations. Some need 2000 Server, some need 2003 Server, some need RedHat, some need SuSE, some need Solaris, etc. Even applications that will run on the same OS may have conflicting library or resource needs. You might have an application that requires a specific revision of the Linux kernel, or one that will not function when a Service Pack is applied to the server. Regardless of the reason, it’s not always feasible, and usually is not desireable, to run multiple different applications on a single server. It makes your SA’s life much more difficult.
What this boils down to, is that each year we spend tens of thousands of dollars upgrading hardware that is already over powered for what it does, just to keep the warrenty and support current. You might not change the application for years, but the hardware upgrade will most certainly happen for any important system. This is where VMWare comes in.
Hardware consolidation has many benefits
Instead of being forced to put each of those applications that are currently running happily on your old Pentium 3 servers onto brand new dual-core monsters, you can instead put them into their own self contained worlds all on 2 (for redundancy of course) dual core monsters. Now, with a single blow, you’ve gone from spending thousands of dollars per server, and upgrading to a machine that’s going to sit idle, to having redundant hardware for 5-10 systems (or more - apparently some companies are consolidating up to a 20:1 ratio!) at the cost of roughly 3 machines (VMWare’s pricing varies heavily based on who you are…for us it’s roughly the cost of one rackmount server).
This consolidation brings with it a number of additional benefits, not immeadiately obvious. In our situation, which is fairly common, we’re pushing the limits of our building HVAC system. If one of our chillers goes out (we have 2), our datacenter goes from 75 to 85-90 in a matter of minutes. Eliminating 4 out of every 5 systems serves to dramatically lower the heat generation in the room. The same goes for electricity - we’re at the limits of our backup power system, and of our available circuits in general (we’re probably around 70-80% of our available building power, as wired). Eliminating 4 out of every 5 systems, many of which are equipped with redundant power supplies, gives us a large amount of breathing room on the UPS and backup power systems, as well as providing for substantially more growth without a massive HVAC/electrical upgrade.
Aside from electrical and HVAC breathing room, buying substantially less new hardware means you can buy much higher grade equipment, with better support and performance characteristics. Instead of cheaping out and getting the SATA disks, you can get the 15KRPM SCSI drives. You can get the dual, dual-core CPU. And possibly most important, you can get that 4-hour onsite service contract, and through the wonderful magic of virtualization, it’s now a 4-hour onsite response FOR ALL OF THE CONSOLIDATED SYSTEMS. You can instantly escalate the level of service by a dramatic amount, and still come out well in the black.
June 6th, 2006