Thursday, October 16, 2008

Greening The Data Centers

Recently Google published the Power Usage Efficiency (PUE) numbers of their data centers. PUE is defined as a ratio of the total power consumed by a data center to the power consumed by the IT equipments of the facility. Google's data centers' PUE ranges from 1.1 to 1.3 which is quite impressive. Though it is unclear why all the data centers have slightly different PUE. Are they designed differently or are they all not tuned to improve for the energy efficiency? In any case I am glad to see that Google is committed to the Green Grid initiative and is making the measurement data and method publicly available. This should encourage other organizations to improve the energy performance of their data centers.

The energy efficiency of a data center can be classified into three main categories:

1. Efficiency of the facility: The PUE is designed to measure this kind of efficiency that is based on how a facility that hosts a data center is designed such as its physical location, layout, sizing, cooling systems etc. Some organizations have gotten quite creative to improve this kind of efficiency by setting up an underground data center to achieve consistent temperature or setting up data centers near a power generation facility or even setting up their own captive power plant to reduce the distribution loss from the grid and meet the peak load demand.

2. Efficiency of the servers: This efficiency is based on the efficiency of the hardware components of the servers such as CPU, cooling fans, drive motors etc. HP's green business technology initiative has made significant progress in this area to provide energy-efficient solutions. Sun has backed up the organization OpenEco that helps participants assess, track, and compare energy performance. Sun has also published their carbon footprint.

3. Efficiency of the software architecture: To achieve this kind of efficiency the software architecture is optimized to consume less energy to provide the same functionality. The optimization techniques have by far focused on the performance, storage, and manageability ignoring the software architecture tuning that brings in energy efficiency.

Round Robbin is a popular load balancing algorithm to optimize the load on servers but this algorithm is proven to be energy in-efficient. Another example is about the compression. If data is compressed on a disk it would require CPU cycles to uncompress it versus requiring more I/O calls if it is stored uncompressed. Given everything else being the same, which approach would require less power? These are not trivial questions.

I do not favor an approach where the majority of the programmers are required to change their behavior and learn new way of writing code. One of the ways to optimize the energy performance of the software architecture is to adopt an 80/20 rule. The 80% of the applications use 20% of the code and in most of the cases it is an infrastructure or middleware code. It is relatively easy to educate and train these small subset of the programmers to optimize the code and the architecture for energy-efficiency. Virtualization could also help a lot in this area since the execution layers can be abstracted into something that can be rapidly changed and tuned without affecting the underlying code to provide consistent functionality and behavior.

The energy efficiency cannot be achieved by tuning things in separation. It requires a holistic approach. PUE ratios identify the energy loss prior to it reaches a server, the energy-efficient server requires less power to execute the same software compared to other servers, and the energy-efficient software architecture actually lowers the consumption of energy for the same functionality that the software is providing. We need to invest into all the three categories.

Power consumption is just one aspect of being green. There are many other factors such as how a data center handles the e-waste, the building material used, the green house gases out of the captive power plant (if any) and the cooling plants etc. However tackling energy efficiency is a great first step in greening the data centers.

1 comment:

Ken Oestreich said...

There's a fourth aspect to efficiency that I would add, which is particularly relevant to this Blog... "Efficient Operation."

For example, in a utility computing environment, resources (usually servers) that aren't needed are retired back to "bare metal", and frequently powered-down. For every Watt a server doesn't consume, there's a multiplier effect on facilities/cooling power you don't need. So for example, during off-peak periods, the savings could be enormous.

The way data centers are operated today, PUE can be misleading since one could have a high PUE, but still have low overall utilization. It's a good start, but we really need a metric indicating utilization *and* powered-on resources.