Home > Performance, Virtualisation > Data Storage for VDI – Part 2 – Disk Latencies

Data Storage for VDI – Part 2 – Disk Latencies

Over the years, I’ve found that there is broad misunderstanding about how many IOPs a disk can do, so its not surprising to hear and see things like the following which I’ve taken from Ruben’s blog.

An IDE or SATA disk rotating at 5,400 or 7,200 RPM. At that rate it can deliver about 40 to 50 IOPS.

This is reasonable enough, but incomplete. I’m fortunate working for one of the worlds best technology companies, because I get access to all kinds of interesting and for the most part unpulished research, some of which shows that a 7200RPM SATA drive has the following I/O characteristics

IOPS Latency in ms
10 13 (minimum)
25 15
60 20
85 30
100 40
130 100

If you were to plot the complete set of IOPS vs. latency on a graph, you’d see fairly linear response times up to around about the 100 IOPS at the 40ms point, but after that there is an exponential increase in latencies. The same is true of all forms of spinning rust with the latency curves spike up at different points depending on the kind of drive and the associated hardware and drive settings (e.g. FCAL vs. SAS interconnects, native command queuing, skip mask writes etc). I’m not trying to be clever here (well maybe just a little bit), but as a storage designer, you really don’t want to drive your disks past that point where the latency begins to spike up. It also means that you need to be careful when specifying your IOPS requirement, that you also specify the level of latency that will keep your end users happy.

For example, some DBA’s consider an average latency of more than 5ms to be unacceptable, Microsoft specifies 20ms as the acceptable latency for an Exchange IO, the SPC-1 benchmark uses 30ms. From field experience, most people are happy with the performance of their file-sharing and VDI environments if the request latency remains below 20ms. As mentioned earlier if you have a 7200RPM SATA drive using native command queuing, you should be able to achieve 60 random 4K IOPS at 20ms response time and a little less for 8K IOPS. For a 15K RPM FC drive, you should be able to achieve about 230-240 random 4K IOPS at 20ms response time with pretty much identical figures for 8K IOPS. (if you’re interested the max IOPS for a 15K drive is about 320 at 70ms, but you really don’t want to go there)

This agrees reasonably well with Ruben Spruijt’s figures, of 50 IOPS for SATA, but I really feel that while  180 IOPS for 15K FC is a fairly common rule of thumb, for VDI deployments it’s too low. 180 IOPS represents a 10ms response time for a 15K drive, and 10ms response time is lower than a SATA drive can sustainable achieve, so it seems a little unfair to equate an 15K FC drive running at 180IOPS with a SATA drive running at 50 IOPS.

Ruben also says

These are gross figures and the number of IOPS that are available to the hosts depend very much on the way they are configured together and on the overhead of the storage system. In an average SAN, the net IOPS from 15,000 RPM disks is 30 percent less than the gross IOPS

Also, fair enough and mostly true, however in my experience, this tax or inefficiency is almost entirely due to the IOPS penalty on writes. Before I go on, I’ll clarify some terminology first, Ruben uses the terms “Net IOPS” for the IOPS served to the hosts, and “Gross IOPS” for the IOPS provided by the disks at the back end. While I like these terms, I’m more used to saying “front-end IOPS” and “back-end IOPS”. I’d also like to introduce another term “IOPS Efficiency Factor”  or “IEF” which is front-end IOPS / back-end IOPS * 100 e.g. if 700 front-end IOPS generated 1000 back-end IOPS at the array this gives an IEF of 70%.

Ruben then goes on to talk about the various RAID levels, and from my perspective does a fairly good job, however there are some places where I think there are some inaccuracies, such as Rubens blanket statement that

In an average SAN, the net IOPS from 15,000 RPM disks is 30 percent less than the gross IOPS.

This might be another handy rule of thumb, but as a storage designer working for an array vendor I can state pretty confidently, that its wrong far more often than its right, and the next blog entry  Data Storage for VDI – Part 3 – Read and Write Caching will explain why.

  1. July 21, 2010 at 12:53 am

    Great information! I’ve been looking for something like this for a while now. Thanks!

  2. Colin
    April 21, 2011 at 3:53 am

    Hi Scott,

    Your blogs are very insightful. Could you elaborate a bit more on why there are different latencies on the same disk? Is it due to command queuing?


  3. April 21, 2011 at 12:05 pm

    Not sure who Scott is, most people who know me call me John or Ricky :-). The exact reasons for the latency curve is something of a mystery to me, though command queuing, elevator algorithms and other smart software embedded in the drives themselves are certainly be part of it. If I dug around sufficiently I could probably dig out the exact reasons why, but the chances are it would also include information that I wouldn’t be able to disclose under my current NDA obligations.

    What I can say is that these latency curves are something that’s been observed in our labs with many different kinds of disk drives. Having said that, as disk arrays implement increasing levels of fine grained virtualisation including automated data movement and caching, accurately extrapolating the performance of an array from its constituent disks becomes much harder to do, and rules of thumb become far less valuable.

  4. Colin
    April 22, 2011 at 2:00 am

    Sorry about the mix-up, John. I was reading another blog on IOPS by Scott Lowe, and confused you with him. I enjoy your blogs.

    Thanks for the elaboration. I guess the only way to tell the performance of a storage system, is to test it with real workloads, for as you say rules of thumb don’t work.

    I’m very interested in the performance of data storage for relational databases instead of VDI. Do you write on that topic as well?


    • May 10, 2011 at 12:28 pm

      Thanks Colin,
      I’ve got a whole stack of things I’d like to blog about, but my time management has been a littl sucky recently.

      Having said that the state of art of performance management and optimisation on arrays for different workloads and configurations is something I spend a fair amount of time on. The trick for me is sorting out the stuff I can talk about vs the stuff that has to stay under non-disclosure. I’ll try to pull my finger out and get blogging again with more regularity, and I’ll make sure that database workloads get good coverage.


  1. July 19, 2010 at 7:18 pm

Leave a Reply - Comments Manually Moderated to Avoid Spammers

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: