Expert Blog: Top 12 KPIs to Remotely Manage Your Data Center

Photo Herman Chan is President of Sunbird Software
Herman Chan, President of Sunbird Software

One of the biggest challenges facing today’s data center professionals is the inability to effectively manage their data centers remotely. Increasing Edge data center deployments, shifts to colocation facilities and the need to work from home through the COVID-19 pandemic prevent on-site management and exacerbate the need for remote monitoring and management of mission-critical infrastructure.

Data center managers are now stuck wondering about questions like: how much capacity do I have and when will I run out? How do I manage moves, adds and changes with remote hands? Where can I deploy equipment if I don’t have the luxury of walking the data center floor? How do I identify and manage hot spots? How do I ensure power loads don’t exceed capacity and cause downtime while I’m not at the data center? And without easy access to the right information, decisions are delayed, problems continue to mount and service delivery is impacted.

To answer such questions, the solution lies in identifying and monitoring Key Performance Indicators (KPIs) and leveraging insights to optimize your data center and increase uptime, improve efficiency, better utilize capacity and boost the productivity of people.

The top 12 KPIs you need to monitor to remotely manage your data centers are:

  • Power utilization and capacity per cabinet – Data center power resources are increasingly constrained, while managing to uptime competes with driving efficient power utilization. By monitoring your power utilization and capacity at the cabinet level, you will improve uptime by ensuring you don’t exceed capacity and save money by discovering stranded power capacity.
  • Real time power trends per cabinet – Many data center managers take weekly or monthly measurements of their power consumption, leaving them vulnerable to short term peaks and potential overloads that are not detected. Monitor your power consumption per rack in real time, trend that data continuously and set thresholds and alerts to ensure that you are notified and able to react before there is a major issue or users are impacted.
  • Stranded power capacity per cabinet – Data center managers will often plan, budget and allocate more power to each server than is actually demanded by the IT equipment. This causes stranded power that can still be utilized in racks. For a single rack, a couple of kilowatts of stranded power may seem unremarkable but when you factor in hundreds or thousands of racks, stranded power could account for as much as 50% of all available power. Monitor power consumption in your data center to identify stranded capacity. Then, deploy that power with confidence and delay spending millions to build your next data center.
  • Available rack units trend – This KPI allows you to see when you may run out of space and how many items can be installed in your data center over time, based on RU height. It is useful to identify trends in the efficiency of your use of space and to correlate how much space vs. power capacity you have to deploy new devices.
  • Available floor space remaining – In addition to tracking available cabinet space, track available floor space by the number of open cabinet positions to know how much white space is available to deploy new cabinets on the data center floor.
  • Data and power ports capacity and usage trends – How effective you are at planning and managing your data center capacity is related to how detailed you are managing your port level capacity. Tracking capacity down to the data and power port level provides granular data that clues you in to how many available ports remain. Monitor your usage and capacity by connector type to ensure you never run out of free data or power ports in your data center. By tracking physical port capacity at the cabinet level, you can intelligently provision new equipment, make more informed capacity planning decisions, use power and network resources more efficiently and reduce operating expenses.
  • Requests by requester, stage, type and location – To maintain SLAs while improving efficiency and productivity of data center staff, you must properly monitor and manage moves, adds and changes. Track the status and number of change requests, tickets and work orders – who is making them and where – what progress is being made and what types of changes are being requested. Track your requests from creation to approval to ensure work order quality and transparency while improving staff efficiency through improved collaboration.
  • Completed requests over time – It’s important to know how much work is being done in the data center. One method of doing this is by monitoring the number of completed requests by type of request over time. Tracking data center activity and productivity in this manner allows you to manage human resource capacity, utilization and productivity more effectively and find opportunities for process enhancements.
  • Asset audit trail – Having complete visibility and transparency into the information and history of any asset in your data center helps drive efficiency and facilitate compliance. For the most effective remote data center management, maintain a real-time audit log for all changes in your data center that includes what action was taken, by who and when.
  • Inlet temperature per cabinet – A common mistake in data center monitoring is to monitor the temperature at the room level rather than at the rack inlet level, potentially leaving you blind to cabinets that are operating at unsafe temperatures. Instead, monitor each cabinet’s inlet temperatures in real time to ensure that your equipment is operating safely within ASHRAE standards, easily identify hot spots and save money by avoiding overcooling.
  • Average max temperature trends – In addition to tracking the latest temperature per cabinet, you should add a level of sophistication to your monitoring by trending that data over time to identify spikes and irregularities. By monitoring the average max temperature per cabinet over time, you can ensure that your equipment is operating within safe guidelines not just now, but all the time. If you see temperature spikes, you’ll have data to identify what the issue was and prevent it from reoccurring.
  • Energy consumption per location – Energy consumption per server is growing each year as increases in performance drive energy demand and the cost of energy consumed can account for up to 50% of total data center operating expenses. As such, energy consumption needs to be monitored and intelligently reduced. Track your energy consumption and set targets to reduce consumption, bill back users, meet corporate sustainability and green initiatives, and collect energy rebates and carbon credits.

It’s more critical than ever to integrate, analyze and act on the KPIs that have the most impact on your daily IT operations, but how do you begin to remotely monitor these metrics? With a comprehensive remote Data Center Infrastructure Management (DCIM) solution, it’s easy.

A modern DCIM tool provides all your most important KPIs right out of the box with zero-configuration dashboard widgets, reports and visual analytics. An enterprise-class data and health poller gathers data directly from facility equipment to ensure accurate, high-quality information that leads to deeper, more reliable insights. Second-generation DCIM makes it simple for data center professionals to make smarter, more informed remote data center management decisions to improve data center health and efficiency while dramatically simplifying capacity management.