Wednesday, January 04, 2023

Server Consolidation and Virtualization with IBM CDAT - 4 January 2023

In times gone by, I worked for IBM and managed a development group writing system software for Windows running on IBM x-series systems.  This was about 2001-2005, give or take.  We started to do a lot of systems work with VMware and Microsoft on virtualization.  VMware was independent (a start-up) and Microsoft bought "the other" virtualization start-up for x86.  The Microsoft product is now Azure, basically.  

IBM had a lot of customers totally immersed in virtualization, but on mainframes.  Because IBM more or less created virtualization (as a product), their monitor software was called "VM".  You can do that with names when you are first into the market.  But virtualization was a strange, new beast in the x86 world.  (Remember that x86 was still almost pure 32-bit, and AMD64 had only recently been announced; Intel had Itanium as their 64-bit architecture.  But that is a conversation for another day.)  

In working with customers to understand requirements for virtualization on x86, it became clear that customers did not really understand it.  The customers were used to a business model in which every Windows server application got its own (physical) machine.  It was said that "Windows apps do not play well together."  Server-class PCs were consider cheap, about $10K (USD) or so, cheap compared to mainframes based on purchase price, so a new app or new capacity on an existing app meant that the business would buy a new server.  Thus, there were file servers, mail servers, web servers, and business app servers, and each of these was a physical server in a rack.  As capacity demands rose, buying a new physical server became more expensive, so some organizations started looking at virtualization as a solution.

The first problem they faced was estimating capacity.  Since every PC server ran its own app, there was always plenty of performance headroom - well, until there was not.  Capacity was usually easily solved by buying a new, bigger server.  With the march of Moore's Law, this was usually a no-brainer.  As the old server ran out of capacity and warranty, then buying a new server got one all sorts of benefits - faster CPU clock, more memory, faster disks, bigger disks, and a new warranty.  This worked pretty well on a server-by-server basis, but it became clear that replacing all the servers every 2-3 years was expensive across the server-room.  Virtualization came as the answer - buy fewer new servers.  Pretty good, but how many to buy?  Nobody bothered to collect performance data on PC servers.  Until a server started running out of capacity, no one cared becasue each app was on its own server.  No competition between apps means no problems.

Back to my software development group.  There was a delay in the hardware schedule that caused a couple of software engineers to have nothing to do for a couple of months.  I kicked around some ideas for short projects with two colleagues, Jim and Ted (I cannot recall Ted's real name, so let us just run with Ted).  I do not recall which of us actually invented the idea, but our conversation converged on a Windows-based tool that would reach out to Windows servers and collect performance and capacity snapshots, collect the data into a database, then analyze the database to recommend virtualizationo consolitation plans.  This became the Analysis part of the tool.

One of the simplest customer problems was that they lacked actual inventory data - they had no list of servers or applications that they were running.  Well, running through lists or ranges of IP addresses and poking ports for responses would quickly and easily find and identify the Windows Servers.  This became the Discovery part of the tool.  

Consolidation using virtualization was the objective, so the tool took on the working name of CDAT, for (server) Consolidation Discovery and Analysis Tool.  I regret that we did not get clever and call it something like "BlueScope", "BlueWindow", or some other IBM blue-colored name, but we were engineers and we missed the opportunity.  Seriously, I regret missing this obvious hook.

Customers loved it.  IBM could walk in with a laptop, plug into the customer's network, and run silently and unobtrusively to collect the data required.  The customer needed to enter some network passwords, but that was about it.  The tool did all the data collection automagically.

IBM sales folks love it.   In the x86 Server space, IBMwas an also-ran.  HP, Dell, and Compaq ran the x86 Server world.  IBM got a chance when HP bought Dell.  Customers usually sought three competing bids, so when Compaq "disappeared" into HP, IBM became the their option, even if HP and Dell were the only "real" players.  To exploit the new foot-in-the-door to become the third bid, the technical support engineer could collect data for a day or a week, spend a little time with spreadsheet and make a highly tailored proposal for the customer.  In contrast, HP and Dell salespeople came in with generic, one-size-fits-all proposals (e.g., 4:1 consolidation for everything).  In contrast, IBM could propose that certain apps remain on dedicated servers, that a middle class of apps could be consilidated at various ratios from 2:1 to 8:1, and a third class of apps could be consolidated at 20:1 or higher.  

The customers were consistently stunned with the IBM proposal.  The proposal named systems that the customer recognized.  It gave supporting statistics for consolidation ratios.  The data could even be sorted the customer's servers into convenient groups according to their resource demands:  memory consumption, CPU consumption, and network consumption.  

As a special deal, I could look at the data and give them the names of apps that were memory leakers and CPU runaways.  I was using a bit of probability, but any Windows app that consumed all 4Gby of memory was either a database server or had a memory leak.  If the admin checked memory usage after the next schedule reboot, they could tell quickly (low memory after boot on a high memory consuming server meant memory leak).  The CPU looper was more speculative, but very few x86 servers actually needed their CPU capacity - most apps were (literally) 1% utilization or less.

As a result, the IBM salespeople presented the customer with highly detailed recommendations that could ben supported with data.  Recommendations that were tailored to the customer's actual needs.  Far better than one-size-fits-all from the other vendors.

Customers loved CDAT.  They would often quickly agree to a prototype - buy 5-10 servers from IBM xSeries and apply part of the consolidation plan using virtualization.  If this worked, the customer was positioned to buy more servers from IBM, ultimately replacing their server hardware.  

In the first year of CDAT use, IBM xSeries used it to secure about $10M in business.  The second year, $20M, and the third year, $40M.  Pretty good.  So good, that CDAT was eventually expanded to cover pSeries AIX systems for server consolidation.  the 

I have two regrets coming out of the experience.  I mentioned my poor choice of names above, and that is big.  Names like "Watson" or "BlueSphere" resonate with people.  The other regret is that I did not push and advocate CDAT as much as I should have.  My attidude was, "yeah, theis is cool, but it was pretty easy."  Bad answer.  The customers loved it, and I should have pushed hard to increase the exposure (within IBM) and add features.  My team could have been seen as the core of a $50M/year business.  Instead, we were neglected.  In retrospect, a missed opportunity on a large scale.  Not smart.

So that is the story of CDAT.

Here is a Computerworld article from 2007 that describes CDAT for public consumption.  As I left in 2005, CDAT outlived me by years.

https://www2.computerworld.co.nz/article/497013/ibm_takes_server_consolidation_tool_smbs/ 

Dated November 2007, the article reports in part:

IBM takes server consolidation tool to SMBs

IBM has just launched an analysis tool that it believes will help businesses find under-utilised x86 servers that could profitably be consolidated.

The vendor believes that server consolidation using virtualisation technology can save up to 60% in IT costs while quadrupling computing utilisation.

Big Blue has expanded its Consolidation Discovery and Analysis Tool (CDAT) to allow IBM resellers and integrators to evaluate smaller environments of 50 servers or fewer. IBM says it is also providing an additional end-user service, Server Consolidation Factory, at US$500 (NZ$655) per 50 servers, which it claims is "up to five times less expensive than competitive services".

As an example, IBM integrator Mainline Information Systems performed a successful CDAT evaluation for a 575-employee client, Frankenmuth Mutual Insurance Company, a US-based property and casualty insurer, which has since virtualised its server farm.





No comments: