Sunday, March 20, 2005

Improving Operational Efficiency

Most computers sit pretty idle most of the time. (In fact, one could well ask if they ever do anything useful!) This is true both for desktops and servers.

Normally, this is because you have to overspecify the kit - based on the average load - in order to be able to handle the peaks. Either to give good response on a desktop machine once the user actually does something, or to handle sudden unpredictable surges in demand, or simply because load is cyclical.

(In our case, one case we have to cater for is the case of an instructor standing in front of a class of 20 students and inviting them all to run some interesting application on our servers - at the same time.)

The well-known downside to this is that you end up with a system that's horribly inefficient. Not only do you have to spend much more up front than you really need, you end up burning electricity and running your air conditioning plant round the clock, so the cost is based on the peak load which is usually highly atypical.

There are a range of solutions that can drive up efficiency and equipment utilization - or, more to the point, how to achieve the same (or better) service for customers and users for lower cost.

On the desktop front, thin-client solutions can lead to considerable savings. Not so much in up-front costs any more, but in terms of power and cost of ownership the thin-client starts to make much more sense. In many ways, though, it's more subtle issues like the lower noise and desktop footprint that ought to make this a no-brainer. We've used SunRays (and the hot desking is really useful) with some success, although we've avoided them for developer desktops in the past because developers tend to want to do things like run netbeans, tomcat, apache, mysql and the like, and you can't really have more than one on a machine. But with zones in Solaris 10 we could consolidate developers onto a SunRay server as well.

Many servers sit pretty idle simply because it's been traditional to allocate one server per service. I know we've done this in the past - simply to keep services separate and manageable. Then server sprawl can become a serious problem. Enter Solaris zones again - it looks like you're running one service per machine, but they're all consolidated. Providing you have some means of handling resource management issues so that you can stop one service monopolizing all the resources on the box, you can consolidate a lot of services onto a single piece of hardware. Not only that, you can afford a better system - more RAS, more power, more memory - so that the services can run faster and more reliably.

Handling cyclical load is another matter. If you have to do something once a month, or once a night, then you normally have a given window in which to do it, and almost by definition the systems used will sit idle the rest of the time. Sure, there's some opportunity to steal CPU and other resource from other machines on your network, but if you want to consolidate then the only opportunity is to find someone else with the same needs but at different times (it's no use if you both want to do the same analysis at the same time!) and share systems with them. (Or, for certain workloads, you have datacenters in different timezones and move the load around the planet as the earth rotates.)

I'm guessing that this is the sort of workload that Sun's recent grid offerings (and grid is one of those words I'll return to in a future blog, no doubt) are designed to address. The business model has to be that if Sun can keep the machines busy then they can make money, and that it's cheaper to buy CPU power when you need it than pay for it and have it sitting idle.

So the grid provision model isn't going to be of any use to customers who have already got high utilization - or, same thing, to customers who have constant workloads. I worked this out for our compute systems and once you get better than 50% (or so - it's only approximate) utilization then it's cheaper to do it yourself. But with utilization much lower than that, it's cheaper to buy the stuff off someone else as you need it.

I wonder who the likely customers are - or what market segment they might be in. In particular, is it going to be large companies or small? If small, there's an opportunity for resellers - brokers, if you like - to act as middlemen, buying large chunks and doing the tasks on behalf of the smaller customers. And while, as I've understood it, Sun are offering capacity in a fairly raw form, smaller end users might be interested in more focused services rather than counting individual cycles.

This could go down to individual consumers once you get into storage. Now, I don't suppose Sun are going to deal with individual consumers, but I know that I would be interested in a gigabyte at a dollar a month for my own critical data. After all, I have a PC and it isn't backed up - and never will be. So I need somewhere to keep this stuff that isn't vulnerable to hardware failure, theft, or user stupidity.

None of this is new, of course. The problem of systems operating inefficiently - at very low utilization levels - has been around for years. It remains to be seen if current initiatives are any more successful than past approaches in getting rid of the horrible inefficiencies we currently put up with.

No comments: