Expert Blog: 40 KPIs To Improve Data Center Efficiency and Health

Herman Chan
Author: Herman Chan, President of Sunbird Software

Modern data center managers are under constant pressure to do more with less while simultaneously being tasked with maximizing uptime and optimizing for efficiency and capacity utilization. In today’s ever-changing data center environment, insights from data provide a critical competitive advantage to help tackle these challenges.

To gauge success and ensure business objectives are met, data center managers are increasingly turning to big data analytics to provide the necessary information, but with the massive volume and variety of data generated by data center devices, they don’t always have the time or training to be able to collect that data, analyze it, and ultimately derive value from it. Plus, when using legacy tools like Excel and Visio, it’s simply not possible to holistically see and analyze this data.

So how do you know where to begin, what to track, and what your goals should be? Based on our conversations with hundreds of customers in our global user groups for our Data Center Infrastructure Management (DCIM) solution, we’ve consolidated feedback on what data matters the most and compiled a list of the top 40 Key Performance Indicators (KPIs) that all data center managers should monitor to improve the overall health and efficiency of their data centers.

The following list contains the top 10 KPIs. For the complete list, download the free eBook Top 40 Data Center KPIs.

  1. Capacity by Key Data Center Resource (Space, Power, Cooling, and Power/Network Port Connections) – Having accurate, reliable, real-time information on the physical space, power, cooling, and network connectivity capacity in your data center is essential for making the most informed, data-driven decisions when you need to reserve space and deploy new IT equipment, use power resources more efficiently, save on operating expenses, or convince management you need more capacity. Being able to monitor real-time capacity at the site, floor, and cabinet levels greatly simplifies how you can find and reserve resources.
  2. Data Center Energy Cost – IDC reports that energy consumption per server is growing by 9% per year globally as increases in performance drives energy demand. The cost of energy consumed can account for up to 50% total data center operating expenses, and as such, needs to be monitored and intelligently reduced. Track your energy consumption and costs by site, department, or applications/ services, and set targets to reduce consumption, bill back users, meet corporate sustainability and green initiatives, and collect energy rebates and carbon credits.
  3. Change Requests by User, Stage, and Type – In a typical data center environment, up to 30% of servers get replaced annually because servers older than five years fail three times more often and cost 200% more to support than new servers. To maintain SLAs while improving efficiency and productivity of data center staff, it is important to simplify the management of moves, adds, and changes. Track the number of change requests, tickets, and work orders, who is making them, what progress is being made, and what types of changes are being requested. Monitor and manage your requests from creation to approval to ensure work order quality and transparency while improving staff efficiency through improved collaboration.
  4. Available Cabinet and Floor Space Remaining – Intelligent space capacity planning is key to navigating data center expansion and optimization. Track available cabinet space by open rack units, including contiguous rack units, to know how efficient your use of space is and to correlate how much space vs. power capacity you have to deploy new devices. You should also track available floor space by open cabinet positions to know how much white space is available to deploy new cabinets on the data center floor. Include planned decommissions and future planned deployments in your reporting for the most accurate view of actual remaining space capacity.
  5. Cabinets with Most Free Data Ports and Power Ports – When provisioning new equipment, you should know the best place to reserve cabinet space to achieve optimal utilization of resources. This requires knowing which cabinets have available data and power port capacity. By tracking physical port capacity at the cabinet level, you can intelligently provision new equipment, make more informed capacity planning decisions, user power and network resources more efficiently, and reduce operating expenses.
  6. Power Trends by Cabinet with Peak Load Thresholds and Alerts – Maximizing uptime and improving data center health are key concerns for all data center managers. Many organizations likely take weekly or monthly measurements of their cabinet power consumption, leaving them vulnerable to short team peaks and potential overloads that are not detected. Monitor your power consumption per rack in real time, trend that data continuously, and set thresholds and alerts to ensure that you are notified and able to react before there is a major issue or users are impacted.
  7. Cabinet Power Failover Redundancy Compliance – Cabinets in modern data centers are densely packed with power-hungry hardware, and data center teams are under pressure to deliver increasing amounts of power to these devices. It is more important than ever to have a power redundancy solution to ensure that power is always available to IT equipment to minimize downtime. Track your cabinet power failover redundancy with the goal of achieving 100% compliance in your data center.
  8. Power Usage Effectiveness (PUE) – PUE, a metric developed by The Green Grid Association, is the most commonly used KPI for reporting data center energy efficiency. It is a ratio of the total amount of energy used by a facility to the energy delivered to IT devices. You should target a PUE of less than 1.5 and even 1.2 if you have a newer data center or are moving to a newer colocation facility. If you have a very high PUE, you have a large opportunity for cost savings by implementing energy efficiency best practices in your data center. Track PUE over time to see the impact of your efficiency optimizations.
  9. Percentage of Cabinets Compliant with ASHRAE Standards – Maximize energy efficiency and ensure optimal environmental conditions for your IT equipment by maintaining your temperature and humidity within the ranges provided by the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). Use environmental sensors to identify hot spots, overcooling, and extreme humidity levels by visualizing all sensor points in thermal envelopes within ASHRAE’s psychrometric charts. Then, track the percentage of cabinets in your data center that are compliant with ASHRAE standards with the goal of maintaining 100% compliance.
  10. Hot Spots Occurrence and Duration – Hot spots are locations at the intake of IT equipment where insufficient cooling causes the temperature to exceed the recommended range, and they pose a threat to equipment and increase outages. Proactively monitor and trend rack inlet temperatures in your environment with the aim to minimize the occurrence, size, and duration of all service-impacting hot spots. To mitigate hot spots, ensure raised floor tiles are placed properly, use appropriate tile perforation, implement hot- and cold-aisle containment, position racks and CRAC units correctly, and spread high-density servers throughout the data center.

For the complete list of the Top 40 Data Center KPIs, download the free eBook.

It’s more critical than ever to integrate, analyze, and act on the KPIs that have the most impact on your environment, but how do you begin to monitor so many metrics? With a comprehensive DCIM solution, it’s easy.

A modern DCIM tool provides all your most important KPIs right out of the box with zero-configuration dashboard widgets, reports, and visual analytics. An enterprise-class data and health poller gathers data directly from facility equipment to ensure accurate, high-quality information that leads to deeper, more reliable insights. Second-generation DCIM makes it simple for data center professionals to make smarter, more informed decisions to improve data center health and efficiency while dramatically simplifying capacity management.

This article was also published to the HostingJournalist WhatsApp channel.
Subscribe here for free
and stay informed, max one message a day.

Sunbird Software