Safety Margin

©2005 Steven M Smith

Jake tossed and turned. He looked at the bedside clock. 3 AM. “I need sleep,” he thought to himself. But sleep would not come. Only worry about tomorrow’s meeting.

Edmund, Jake’s manager’s manager, enjoyed probing managers in his organization to discover how well they were doing. He expected his managers to lead people so they wanted to follow. And he expected them to have complete command of their business. “There are no bad soldiers,” the retired Army Captain loved to say, “only bad officers.”

“Am I a bad officer?” Jake wondered. “How would Edmund even know? He doesn’t understand Storage Resource Management (SRM), and that’s my organization’s total responsibility — everything from provisioning to backup to disaster recovery.”

His eyes were wide open as he stared at the ceiling thinking about what questions Edmund would ask him. His mind raced through the standard list, contemplating his responses.

“Where do you stand versus budget?” (I’m square on that, with the paperwork to back me up.)

“What are our biggest risks?” (My risk spreadsheets should toss out this one.)

“What are we doing to mitigate those risks?” (Yep, the spreadsheet will handle this, too.)

“What’s the morale of your staff?” (Hmm, that’s harder, but I’ll ask my most vocal people this morning.)

“So why am I tossing and turning?” Suddenly he sat up in bed, wide awake. “It’s storage utilization. He’s going to ask me why utilization is only 50%.”

“He’s going to ask about that, first thing, because it looks like a glaring management problem. Especially since just last week I made a procurement request for more storage — which he hasn’t signed yet. Sure as heck, he’s going to ask me why.”

The alarm sounded. 3 hours had passed. Jake hadn’t slept a wink. 4 hours until the meeting with Edmund.

Jake slid out of bed and stumbled into the bathroom. He looked in the mirror and saw the big bags under his eyes. He fired up the shower, tested the water, which was just right, and jumped in. He did his best thinking in the shower. He needed a revelation.

“I must be managing better than the data shows,” he muttered to himself as we washed his hair, “The utilization number doesn’t tell the whole story: I need all the storage I have and more.”

The shower water began to cool, and he hadn’t finished rinsing. “Darn,” he thought. “I should have installed a bigger hot water heater.” Shivering as he rinsed away the suds, the idea snapped into his head. “Of course. Data storage is just like the hot water.” He scrambled out of the shower and grabbed a towel.

He wrapped the towel tightly around his freezing body, but his mind was running a mile a minute. “Data storage is like the hot water — it requires a safety margin to prevent the business from being impacted by a shortage.”

Clarify the Story

In my experience, Jake isn’t alone. Managers of storage organizations everywhere face the difficult task of providing a simple explanation to upper management about resource utilization.

I have faced this problem many times throughout my career. I have never forgotten the words of the CIO of Consolidated Freightways, “Steve, we never send our trucks out half full. I expect us to fully utilize all of our IT resources in the same way we fully utilize our trucks.”

His comparison is the first clue to explaining resource utilization to upper management. The utilization that you report is compared with other resources that may have much different utilization characteristics. 100% utilization may appear possible in the packing of a truck but it isn’t possible with storage. Show me a fully utilized disk array with dynamic data and I’ll show you a broken system.

Compounding the explanation problem, upper management wants to understand resource utilization and its relationship to cost with minimal understanding of the underlying technology. They pay you to understand, manage and measure.

Storage managers typically answer the following questions:

  • How much storage do we have?
  • What percentage is allocated?
  • What percentage is unallocated?
  • What percentage is allocated but not being used (free space)?

Note, it’s easier for everyone if the percentages all refer to the same quantity, namely the total amount of storage.

A manager who answers as follows will trap themselves:

1,000 Terabytes
95% Allocated
5% Unallocated
50% Free

Why a trap? Because upper management will sum Unallocated and Free and ask the inevitable question, “Are you telling me that 55% of our storage isn’t being used (my costs are more than twice the amount truly required)?”

Caught flat footed, the typical storage manager answers, “No, uh-uh…. I mean yes,” followed by a long, complicated explanation that doesn’t satisfactorily answer upper management’s question and, most importantly, wasn’t desired.

Upper management will certainly doubt whether the manager of a storage management group is effectively managing his or her costs if they hear unused space is greater than 40%. Don’t trap yourself.

Save yourself a lot of grief by adding answers to the following questions:

  • What percentage is a safety margin?
  • What percentage is now free after factoring in the safety margin?

What does Safety Margin mean? It’s space reserved for growth. A dynamic file system or database must have free space for growth. Let’s use a conservative 20% rule of thumb for the Safety Margin for each file system, for instance a 100 GB file system would thus have a 20 GB reserve as a safety margin for growth.

Upper management needs to know that without an adequate safety margin they risk an unplanned system outage. A series of requests for more storage than is available will cause an outage, which may impact a critical business function.

And, finally, upper management always likes numbers to add up so transform the percentages so their sum is 100%. That means rather than reporting Allocated, you report all its elements — Utilized, Safety Margin and Free.

With answers to the fifth and sixth questions and transformation of the percentages, the storage manager can say:

1,000 Terabytes
50% Utilized
19% Safety Margin
26% Free
5% Unallocated

Calculate Utilized as 100% minus what we previously called Free (50%), which equals 50%. Calculate Safety Margin as 20% of Allocated (100% minus Unallocated), which equals 19%. Calculate Free as Allocated (100% minus Unallocated) minus Utilized minus Safety Margin, which equals 26%.

When upper management now sums Free and Unallocated, they arrive at 31% rather than 55%. They will ask for better results, but they won’t think you are an ineffective manager.

And everything you are sharing with them is a fact. You have simplified the story so they don’t become frustrated by unneeded and undesired explanation.

Tell the Story

Jake arrived at the office at 7:30 AM. The meeting with Edmund loomed over him, but he was determined to put to use what he discovered in the shower.

Without the revelation, Jake would have shared with Edmund the following state information:

2,500 Terabytes
95% Allocated
5% Unallocated
50% Free

Jake used the safety margin concept to transform the state information as follows:

2,500 Terabytes
50% Utilized
19% Safety Margin
26% Free
5% Unallocated

He discussed morale with two of his most vocal people. They reported that they felt good about things and they thought others did too. He walked into Edmund’s office at 10:00 AM feeling fully prepared.

Edmund asked him the usual questions. And Jake felt good about his answers. Edmund was pleased to hear that the morale of Jake’s people was high.

Edmund asked Jake, “What is the utilization of the storage?” Jake shared with him the five numbers and explained to Edmund the safety margin concept. He nodded and said, “You are telling me that we need a certain percentage in every file system as a reserve to prevent application outages.”

Jake smiled and replied, “Exactly. It’s just like with the disk on your PC. If you want to put more files on it, you must have available space.”

Edmund smiled back and asked, “Why 20%? Why not 10%.”

“20% is a starting point. I will monitor and calibrate it as we get more experience. And I’ll report what my team thinks is an appropriate safety margin each time we talk.”

Edmund wasn’t finished, “Okay,” then he frowned and asked, “Tell me why you have asked for more storage when I see that 31% of the storage isn’t used?”

Jake thought to himself, “That’s the question I was worried about but not anymore.” He paused and then replied, “We have a new application that will consume all of unallocated space and more.”

“We are monitoring storage closer than last year and now know the clients who are wildly exceeding the expected safety margin for their file systems. But going back now and releasing that free space requires downtime to reallocate and move the data, which puts the business at a bigger risk than if we wait until the next technology refresh on the storage array that contains the file systems. With the storage technology we use, freeing storage during the technology refresh is the safest time to reduce allocation sizes and thus return the free space to the pool of unallocated storage.”

Edmund smiled and said, “Jake I have enjoyed talking with you. You have a strong command of your business. I was thinking of raising your salary…”

“Oh, boy,” Jake thought.

” …but I’m not sure I have enough safety margin in my payroll budget.”

This entry was posted in Articles and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *