Data Storage for VDI – Part 8 – Misalignment
If you follow NetApp’s best practice documentation all of the stuff I talked about works as well, if not better than outlined at the end of my previous post. Having said that it’s worth repeating that there are some workloads that are very difficult to optimize, and some configurations that don’t allow the optimization algorithms to work, the most prevalant of which is misaligned I/O.
If you follow best practice guidelines (and we all do that now don’t we …) then you’ll be intimately familiar with NetApp’s Best Practices for File System Alignment in Virtual Environments. If on the other hand you’re like pretty much everyone that went to the the Vmware course I attended, then you may be of the opinion that it doesn’t make that much of a difference. I suspect that if you I asked your opinion about whether you should go to the effort to ensure that your guest O/S partions are aligned, your response would probably fall into one of the following categories
- Not Recommended by VMware (They do, but I’ve heard people say this in the past)
- Something I should do when I can arrange some downtime during the Christmas holidays
- What you talking about Willis ?
If there is one thing I’d like you to take away from this post, it is the incredible importance of aligning your guest operating systems. After the impact of old school backups and virus scans, it’s probably the leading cause of poor performance at the storage layer. This is particularly true if you have minimized the number of spindles in your environment by using single instancing technologies such as FAS deduplication.
Of course this being my blog, I will now go into painful detail to show why it’s so important, if you’re not interested or have already ensured that everything is perfectly aligned, stop reading and wait until I post my next blog entry 🙂
Every disk reads and writes its data in fixed block sizes, usually either 512 or 520 bytes which effectively stores 512bytes user data and 8 bytes of checksum data. Furthermore the storage arrays I’ve worked with that get a decent number of IOPS/spindle all use some multiple of these 512 bytes of user data as the smallest chunk that it stored in cache, usually 4KiB or some multiple thereof. The arrays then perform reads and writes of data using to and from disks using these these chunks along with the appropriate checksum information. This works well because most applications and filesystems on LUNs / VMDKs / VHD’s etc also write in 4K chunks. In a well configured environment, the only time you’ll have a read or more importantly a write request that is not some multiple of 4K is in NAS workloads, where overwrite requests can happen across a range of bytes rather than a range of blocks, but even then it’s a rare occurrence.
Misalignment of I/O though causes a write from a guest to partially write to two different blocks which explained with pretty diagrams in Best Practices for File System Alignment in Virtual Environments, however that document doesnt quite stress how much of a performance impact this can have when compared to niceley aligned workloads, so I’ll spend a bit of time on this here.
When you completely overwrite a block in its entirety, an arrays job is trivially easy,
- Accept the block from the client and put it in the one of the write cache’s block buffers
- Seek to the block you’re going to write to
- Write the block
Net result = 1 seek + 1 logical write operation (plus any RAID overheads)
However when you send an unaligned block, things get much harder for the array
- Accept a block worth of data from the client, put some of it in one of the block buffers in the arrays write cache, put the rest of it into the adjacent block buffer. Neither of these block buffers will be completely full however, which is bad.
- If you didn’t already have the blocks that are going being overwritten in the read cache, then
- Seek to where the two blocks start
- read the 2 blocks from the disk to get the parts you don’t know about
- Merge the information you just read from disk / read cache with the blocks worth of data you received from the client
- Overwrite the two blocks with the data you just merged together
Net result = 1 seek + some additional CPU + double write cache consumption + 2 additional 4K reads, and one additional 4K write (plus any RAID overheads) + inneficient space consumption.
The problem as you’ll see isn’t so much a misaligned write as such, but the partial block writes that it generates. In well configured “Block” environments (FC / iSCSI), you simply won’t ever see a partial write, however in “File” environments (CIFS/NFS) environments, partial writes are a relatively small, but expected part of many workloads. Because FAS arrays are truly unified for both block and file, Data ONTAP has some sophisticated methods of detecting partial writes, holding them in cache, combining them where possible, and committing them to disk as efficiently as possible. Even so, partial writes are really hard to optimize well.
There are many clever ways of optimizing caching alogrithms to mitigate the impact of partial writes, and NetApp combines a number of these in ways that I’m not at liberty to disclose outside of NetApp. We developed these otptions because a certain amount of bad partial write behavior is expected from workloads targeted at a FAS controller, and much like it is with our kids at home, tolerating a certain amount of “less than wonderful” behavior without making a fuss allows the household to run harmoniously. But this tolerance has its limit and after a point it needs to be pulled into line. While Data ONTAP can’t tell a badly behaved application to sit quietly in the corner an consider how its behavior is affecting others, it can mitigate the impact on partial writes on well behaved applications.
Unfortunately environments that do wholesale P2V migrations of WinXP desktops without going through an alignment exercise, will almost certainly generate large number of misaligned writes. While Data ONTAP does what it can to maintain the highest performance it can under those circumstances, these misaligned writes much harder to optimise, which in turn will probably have a non-trivial impact on the overall performance by multiplying the number of I/O’s required to meet the workload requirements.
If you do have lots of unaligned I/O in your environment, you’re faced with one of four options.
- Use the tools provided by NetApp and others like VisionCore to help you bring things back into alignment
- Put in larger caches. Larger caches, especially megacaches such as FlashCache means the data needed to complete the partial write will already be in memory, or at least on a medim that allows sub millisecond read times for the data required to complete partial writes.
- Put in more disks, if you distribute the load amongst more spindles, then the read latency imposed by partial writes will be reduced
- Live with the reduced performance and unhappy users until your next major VDI refresh
Of course the best option is to avoid misaligned I/O in the first place by following Best Practices for File System Alignment in Virtual Environments. This really is one friendly manual that is worth following regardless of whether you use NetApp storage or something else.
To summarise – misaligned I/O and partial writes are evil and they must be stopped .