if you take that 3 step process for creating a “Software Defined” infrastructure that I outlined in my previous post, you could reasonably say that storage has been “software defined” since about 1982 (arguably as early as 1958 when the first disk drive made its appearance)
- Step 1 – identify and then formally define a set of common functions or primitives performed by existing infrastructure that are optimally run in purpose built devices (e.g. hardware filled with interfaces and ASICs) – This becomes the "Data Plane".
- From a data storage perspective I have broken down what I see as the common storage primitives into four main categories. I’ll probably use these categories as a tool for functional comparisons of various Software Defined Storage implementations going into the future.
placement managment – e.g. given an logical address and some data by a requestor, write that data to an underlying storage medium so that it can be subsequently retrieved using that address without the requestor needing to be aware of the physical characteristics of that underlying storage medium
access managment – e.g. given an address by a requestor, read data from an underlying storage medium and make it available to the requestor. Additionally in the case where multiple requestors may make simultaneous requests to place or access the same data, provide a mechanism to arbitrate that access.
copy management – e.g. given a set of source addresses and a range of target addresses, copy the data from the source to the target on behalf of the requestor
persistence management – in most storage systems this is an implied function, though increasingly with the rise of protocols such as CDMI, and XAM, data persistence SLOs are being explicitly defined at placement time. In most cases however, data must be stored until the device itself fails, and the device is generally expected to have a lifetime of multiple years.
- Step 2 – Create a protocol that manages those functions
- The great thing about standards is there’s so many of them … and the storage industry just LOVES forming standards bodies to create new protocols to manage the functions I described above. Many of them have been around for a while: SCSI was standardised in 1982, NFS in 1989, SMB in 1992 (kind of), OSD in 2004 and in more recent times we have seen implementations like XAM in 2010, and most recently CDMI which became an ISO standard in 2012.
Some of us get religious about these standards and which one should be used for what purposes, what I find interesting is that they all seem to be converging around a common set of functionality, so it’s possible that we will eventually see one storage protocol to rule them all, but I doubt it will happen any time soon. In the near term, whether we need to create another new protocol is debatable, but as of this moment I’m pretty impressed with the work being done at SNIA with CDMI, not as a “new replacement” but as something which leverages the work that’s already been done with the other protocols and fills in their gaps, but I’m getting WAY ahead of myself here.
- Step 3 – Create a standards compliant controller that runs on general purpose hardware (e.g. an intel server, virtualised or otherwise) that takes higher order service requests from applications and translates those into the primitives codified in step 1, over the protocol devised in step 2. – This becomes the "Control Plane"
- Well if you accept that the existing storage protocol standards are functionally equivalent to the OpenFlow Protocol in Software defined Networking, then pretty much any modern operating system could function as a controller. Also any modern hypervisor also acts as a controllers, and any storage array which uses SCSI protocols to talk to the disks at the back end also acts as a controller, and in my view this is an accurate description.
- Each of these constructs acts a a standards compliant controller in a software defined storage infrastructure, with multiple levels of encapsulation with consequent challenges that there is significant functional control overlap between these controllers. Over the next few posts I’ll go through what this encapsulation looks like, where the challenges and opportunities are in each level, the design choices we face, and build that up so we can see how close we are to achieving something that matches some of they hype around software defined storage.
It’s also worth noting that until I’ve reached my conclusion, much of what I’ve written and will write will not neatly match up with the analyst definitions of software defined storage. If you bear with me we’ll get there, and probably then some. My hope is that if you follow this journey you’ll be in a better position to take advantage of something that I’ll be referring to as “SLO Defined” storage (simply because I really don’t think that “Software Defined” is particularly useful as a label)
If you want to jump there now and get the analysts views, check out what IDC and Gartner have to say. For example IDC’s definitions of software defined storage from http://www.idc.com/getdoc.jsp?containerId=prUS24068713 says in part
software-based storage stacks should offer a full suite of storage services and federation between the underlying persistent data placement resources to enable data mobility of its tenants between these resources
The Gartner definition which isn’t public, takes a slightly different approach and can be found in their document “2013 Planning Guide: Data Center, Infrastructure, Operations, Private Cloud and Desktop Transformation” where it talks about higher level functionality including the ability for upper level applications to define what storage objects they need with pre-defned SLO’s and then have that automatically provisioned to them. (or at least that is my take after a quick read of the document).
IMHO, both of these definitions have merit, and both go way beyond merely running array software in a VSA, or bundling software management functions into a hypervisor, or pretty much anything else that seems to pass for Software Defined Storage today, which is why I think it’s worth writing about …. In …. Painful …. Detail
As always, Comments and Criticisms are welcome.
-Update 11/4/13 – Darius approved the comment in question and said sorry, so I’m all happy now. Given how long and verbose my comments are, I suspect he simply didn’t have time to read through it and figure out if it was safe to publish on an Oracle hosted website. While I disagree with Darius on a lot of points, I also like a lot of what he writes, and I suspect he’d be an interesting guy to have a beer with 🙂
Forgive me socialmeda for I have sinned, It’s been four months since my last blogpost. There are a bunch of reasons for this, mostly that I’ve had some real-world stuff that’s been way more important than blogging, and I’ve limited my technical writing to posting comments on other blogs or answering questions on Linked-in. Before I start writing again, there’s something I’d like to get off my chest. It _really_ bugs me when people edit out relevant comments. As a case in point I was having what I believed to be a reasonably constructive conversation with Darius Zanganeh of Oracle on his blog , but for some reason he never approved my final comment which I submitted to his blog on December 7th 2012, the text of which follows. If you’re interested, head over to his blog and read the entire post, I think it’s pretty good, it showcases some impressive benchmark numbers Oracle has been able to achieve with scale-up commodity hardware. From my perspective this is a great example of that a deeper analysis of good benchmarks demonstrate far more than top line numbers and $/IOPS …and if you know me, then you know I just LOVE a good debate over benchmark results, so I couldnt resist commenting even though I really had better/more important things to do at the time.
Thanks Darius, its nice to know exactly what we’re comparing this to. I didn’t read the press releases, nor was I replying to that release, I was replying to your post which was primarily a comparison to the FAS6240.
If you do want to compare the 7420 to the 3270, then I’ll amend the figures once again .. to get a 240% better result you used a box with
- More than eleven times as many CPU cores
- More than one hundred and sixty times as much memory
I really wish you’d also published Power Consumption figures too 🙂
Regarding disk efficiency, we’ve already demonstrated our cache effectiveness. On a previous 3160 benchmark, we used a modest amount of extended cache and reduced the number of drives required by 75%. By way of comparison to give about 1080 IOPS/15K Spindle we implemented a cache that was 7.6% of the capacity of the fileset. The Oracle benchmark got about 956 IOPS/drive with a cache size about 22% of the fileset size.
The 3250 benchmark on the other hand, wasn’t done to demonstrate cache efficiency, it was done to allow a comparison to the old 3270. It’s also worth noting that the 3250 is not a replacement for the 3270, it’s a replacement for the 3240 with around 70% more performance. Every benchmark we do is generally done to create a fairly specific proof point, in the case of the 3250 benchmark it shows that it has almost identical performance as the 3270 for a controller that sells at a much lower price point.
We might pick one of our controllers and do a “here’s a set config and here’s the performance across every known benchmark” the way Oracle seems to have done with the 7420. It might be kind of interesting, but I’m not sure what it would prove. Personally I’d like to see all the vendors including NetApp do way more benchmarking of all their models, but it’s a time-consuming and expensive thing to do, and as you’ve already demonstrated, its easy to draw some pretty odd conclusions from them. We’ll do more benchmarking in the future, you’ll just have to wait to see the results 🙂
Going forward, I think non-scale out benchmark configs will still be valid to demonstrate stuff like model replacement equivalency, and cache efficiency, but I’ll say it again, if you’re after “my number is the biggest” hero number bragging rights, scale out is the only game in town. But scale-out isn’t just about hero-numbers, for customers to rapidly scale without disruption as needs change, scale-out is an elegant and efficient solution and they need to know they can do that predicably and reliably. That’s why you see the benchmark series like the ones done by NetApp and Isilon. Even though scale-out NFS is a relatively small market, and Clustered-ONTAP has a good presence in that market, scale-out Unified storage has much broader appeal and is doing really well for us. I cant disclose numbers, but based on the information I have, I wouldn’t be surprised if the number of new clusters sold since March exceeds the number of Oracle 7420s sold in the same period, either way I’m happy with the sales of Clustered-ONTAP.
As technology blogger, its probably worth pointing out that stock charts are a REALLY poor proxy for technology comparisons, but if you want to go there, you should also look at stuff like P/E multiples (an indication of how fast the market expects you to grow), and market share numbers. If you’ve got Oracle’s storage revenue and profitability figures hand for use to do a side by side comparison to the NetApp published financial reports, post them on up, personally I would LOVE to see a comparison. Then again, maybe your readers would prefer us to stick to talking about the merits of our technology and how that can help them solve the problems they’ve got.
In closing, while this has been fun, I don’t have a lot more time to spend on this. I have expressed my concerns at the amount of hardware you had to throw at the solution to achieve your benchmark results, and the leeway that gives you to be competitive with street pricing, but as I said initially your benchmark shows you can get a great scale-up number, and you’re willing to do that at a keen list price, nobody can take that away from you, kudos to you and your team.
Other than having an opportunity to have my final say, my comments also underlines some major shifts in the industry that I’ll be blogging about over the next few months.
1. If you’re after “my number is the biggest” hero number bragging rights, scale out is the only game in town
2. Scale out Unified and Clustered ONTAP is going really well, I cant publish numbers, but the uptake has surprised me, the level of interest I’ve seen from the breifings I’ve been doing has been really good.
3. Efficiency matters, getting good results by throwing boatloads of commodity hardware at a problem is one way of solving a problem, but it usually causes problems and shifts costs elsewhere in the infrastructure (power, cooling, rackspace, labour, software, compliance etc etc)
I’ll also be writing a fair amount about Flash and Storage Class Memory, and why some of the Flash Benchmarks/Claimed performance is silly enough in my opinion to to be bordering on deceptive … until then, be prepared to dig deeper when people start to claim IOPS measured in the millions, until then, have fun 🙂
John Martin (life_no_borders)
–This has been revised based on some comments I’ve received since the original posting, check the comment thread if you’re interested what/why–
I came in this morning with an unusually clear diary, and took the liberty to check the newsfeeds for NetApp and EMC, this is when I came across an EMC press release entitled “EMC VNX SETS PERFORMANCE DENSITY RECORD WITH LUSTRE —SHOWCASES “NO COMPROMISE” HPC STORAGE“.
I’ve been doing some research on Lustre and HPC recently, and that claim surprised me more than a little, so I checked it out, maybe there’s a VNX sweetspot for HPC that I wasnt expecting. The one thing that stood out straight away was . “EMC® is announcing that the EMC® VNX7500 has set a performance density record with Lustre—delivering read performance density of 2GB/sec per rack” (highlight mine)
In the first revision of this I had some fun pointing out the lameness of that particular figure, (e.g. “From my perspective, measured on a GB/sec per rack, 2GB/sec/rack is pretty lackluster”) , but EMC aren’t stupid (or at least their engineers aren’t, though I’m not so sure about their PR agency at this point), so it turns out that this was one of those things where it seems that EMC’s PR people didn’t actually listen to what the engineers were saying, and it looks like they’ve missed out a small but important word, and that word is “unit”. This becomes apparent if you take a look at the other stuff in that press release “8 GB/s read and 5.3 GB/s write sustained performance, as measured by XDD benchmark performed on a 4U dual storage processor”. This gives us 2GB/sec/rack unit which actually sounds kind of impressive. So let’s dig a little deeper, what we’ve got is a 4U dual storage processor that gets some very good raw throughput numbers, about 1.5x, or 150% faster in fact on a “per controller” basis than the figures used on the E5400 press release I referenced earlier, so on that basis I think EMC has done a good job. But this is where the PR department starts stretching the truth again by leaving out some fairly crucial pieces of information. Notably that crucial information that the 2GB/sec/rack unit is for 4U controller is a 2U VNX7500SPE with 2U standby power supply which is required when the 60 drive dense shelves are used exclusively (which is the case for the VNX Lustre Proof of Concept information shown in their brochures), and this configuration doesn’t include any of the rack units required for the actual storage. Either that, or its a 2U VNX7500SPE with a 2U shelf , and no standby power supply that seems to be mandatory component of a VNX solution, and I cant quite believe that EMC would do that.
If we compare the VNX to the E5400, you’ll notice that controllers and standby power supplies alone consume 4U of rack space without adding any capacity, whereas the E5400 controllers are much much smaller, and they fit directly into a 2U or 4U disk shelf (or DAE’s in EMC terminology) which means a 4U E4500 based solution is something you can actually use, as the required disk capacity is already there in the 4U enclosure.
Lets go through some worked calculations, to show how this works. In order to add capacity in the densest possible EMC configuration, you’d need to add an additional 4RU shelf with 60 drives in it. Net result 8RU, 60 drives, and up to 8 GB/s read and 5.3 GB/s write (the press release doesn’t make it clear whether a VNX7500 can actually drive that much performance from only 60 drives, my suspicion is that it cannot, otherwise we would have seen something like that in the benchmark). Meausred on a GB/s per RU basis this ends up as only 1 GB/sec per Rack Unit, not the 2 GB/sec per Rack Unit which I believe was the point of the “record setting” configuration. And just for kicks as you add more storage to the solution that number goes down as shown for the “dual VNX7500/single rack solution that can deliver up to 16GB/s sustained read performance” to about 0.4 GB/sec per Rack Unit. Using the configurations mentioned in EMC’s proof of concept configuration you end up with around 0.666 GB/sec per Rack Unit, all of which are a lot less than the 2 GB/sec/RU claimed in the press release
If you wanted to have the highest performing configurations using a “DenseStak” solution within those 8RU with an E5400 based Lustre solution, you’d put in another e5400 unit with an additional 60 drives Net result 8RU, 120 drives, and 10 GB read and 7 GB/sec write (and yes we can prove that we can get this kind of performance from 120 drives). Meausured on a GB/s per RU basis this ends up as 1.25 GB/sec per Rack Unit. That’s good, but its still not the magic number mentioned in the EMC press release, however if you were to use a “FastStak” solution, those numbers would pretty much double (thanks to using 2RU disk shelves instead of 4RU disk shelves) which would give you controller performance density of around 2.5 GB/sec per Rack Unit.
Bottom line, for actual usable configurations a NetApp solution has much better performance density using the same measurements EMC used for their so called “Record Setting” benchmark result.
In case you think I’m making these numbers up, they are confirmed in the NetApp whitepaper wp-7142 which says
The FastStak reference configuration uses the NetApp E5400 scalable
storage system as a building block. The NetApp E5400 system is designed
to support up to 24 2.5-inch SAS drives, in a 2U form factor.
Up to 20 of these building blocks can be contained in an industry-standard
40U rack. A fully loaded rack delivers performance of up to 100GB/sec
sustained disk read throughput, 70GB/sec sustained disk write throughput,
and 1,500,000 sustained IOPS.
According to IDC, the average supercomputer produces 44GB/sec,
so a single FastStak rack is more than fast enough to meet the I/O
throughput needs of many installations.
While I’ll grant that this result is achieved with more hardware, it should be remembered that the key to good HPC performance is in part about the ability to efficiently throw hardware at a problem. From a storage point of view this means having the ability to scale performance with capacity. In this area the DenseStak and FastStak solutions are brilliantly matched to the requrements of, and the prevailing technology used, in High Performance Computing. Rather than measuring on a GB/sec/rack unit I think a better measure would be “additional sequential performance per additional gigabyte”. Measured on a full rack basis, the NetApp E5400 based solution ends up at around 27MB/sec/GB for the DenseStak, or 54MB/sec/GB for the FastStak. In comparison, the fastest EMC solution as referenced in the “record setting” press release comes in at about 10MB/sec of performance for every GB of provisioned capacity or about 22MB/sec/GB for the configuration in the proof of concept brochure . Any way you slice this, the VNX just doesn’t end up looking like a particularly viable or competetive option.
The key here is that Lustre is designed as a scale out architecture. The E5400 solution is built as a scale out solution by using Lustre to aggregate the performance of the multiple carfully matched E5400 controllers, whereas the VNX7500 used in the press release is relatively poorly matched scale-up configuration which is being shoe-horned into use case it wasn’t designed for.
In terms of performance per rack unit, or performance per GB there simply isn’t much out there that comes close to a E5400 based Lustre solution, certainly not from EMC, as even Isilon, their best Big Data offering, falls way behind. The only other significant questions that remain are how much do they cost to buy, and how much power do they consume ?
I’ve seen the pricing for EMC’s top of the range VNX 7500, and its not cheap, its not even a little bit cheap, and the ultra-dense stuff shown in the proof of concept documents is even less not cheap than their normal stuff. Now I’m not at liberty to discuss our pricing strategy in any detail on this blog, but I can say that in terms of “bang per buck”, the E5400 solutions are very very competetive, and the power impact of the E5400 controller inside of 60 drive dense shelf is pretty much negligible. I don’t have the specs for the power draw on a VNX7500 and its associated external power units , but I’m guessing it adds around as much as a shelf of disks, the power costs of which add up over the three year lifecycle typically seen in these kinds of environments.
From my perspective the VNX7500 is a good general purpose box, and EMC’s engineers have every right to be proud of the work they’ve done on it, but positioning this as a “record setting” controller for performance dense HPC workloads on Lustre, is stretching the truth just a little too far for my liking. While the 10GB/sec/rack mentioned in the press release might sound like a lot for those of us who’ve spent their lives around transaction processing systems, for HPC, 10GB/sec/rack simply doesnt cut it. I know this, the HPC community knows this, and I suspect most of the reputable HPC focussed engineers in EMC also know this.
It’s a pity though that EMC’s PR department is spinning this for all they’re worth ; I struggle with how they can possibly assert that they’ve set any kind of performance density record for any kind of realistic Lustre implementation, when the truth is that they are so very very far behind. Maybe their PR dept has been reading 1984, because claiming record setting performance in this context requires some of the most bizarre Orwellian doublespeak I’ve seen in some time.
I’m in the middle of digesting what was actually released in EMC’s recent launch. For the most part there isn’t anything really that new: lots of unsupported hype like, “3 times simpler, 3 times faster.” Faster than what, exactly? From a technical perspective the only thing that’s really interesting or surprising is the VNXe and that was less interesting than I expected because I thought they were going to refresh their entire range using that technology. So it looks like they’ve given up trying to make that scale for the moment.
So much of what they’ve done copies or validates what we’ve already done at NetApp:
- Simplified software packaging
- Launching a lot of stuff at the same time
- New denser shelves with small form-factor drives
- An emphasis on storage efficiency
- An emphasis of flash as a caching layer
- The ideal match between unified storage and virtualized environments
The biggest change that I see is that they now appear to be shipping all their controllers with unified capability from the start, enabled via a software upgrade which is something EMC has criticised us for in the past. Now they acknowledge that the only way to compete with NetApp effectively is to try to be as much like us as they possibly can. This might explain why EMC in Australia isn’t going to sell the “Block only” VNX 5100. SearchStorage.com.au had this report:
EMC’s new VNX 5100 (pictured), a block-only storage device, won’t go on sale in Australia becaus “We did not see great enough demand to see that particular system,” according to Mark Oakey, the company’s Marketing Manager for Storage Platforms in Australia and New Zealand. “We’ll continue with the Clariion CX4 120,” he told SearchStorage ANZ. “It has more or less the same capabilities.”
Most of the interesting capabilities they’re touting came last year with FLARE 30 and DART 6.0 (two of their operating systems). Even the VMax stuff they’re pushing during the launch came out via a software upgrade without a lot of fanfare in December, so as far as I can see their “record breaking announcement” consists of announcing a whole bunch of things they’d already done along with some new tin.
Things they didn’t announce:
- Multistore equivalency
- V-Series equivalency
- Unified replication capabilities
- A commercial grade VMware based “Virtual Storage Array”- The new low end box is based on Linux
- A scale out roadmap for their “Unified” platform
- Any significant change in their management software strategy or offering
- Block level deduplication for their unified arrays
- Clarification on where their newly acquired scale out Isilon systems fit within their new “Unified” ecosystem.
Overall EMC did a catch up release to try and maintain pace with NetApp innovation, and nothing they’ve done or released represents a significant new threat. If this is
“the most significant midrange announcement in EMC’s 30-year history”
according toi Rich Napolitano, President, Unified Storage Division at EMC, then EMC will continue to play catch up as NetApp redefines Unified Storage and its role in shared infrastructure.
This is an edited copy of a comment I posted on storagezilla’s blog (the second of two which said more or less the same thing, the comment moderation there doest seem to indicate whether the comment is posted or queued). There was a secondary request to EMC which dilutes the essence of the apology and on reflection and feedback from other’s I’ve decided to remove it.
You’re right, it was 62 vBlock accounts not 62 vBlock’s sold as I tweeted
“1 Year in and just over 60 V-Blocks sold. I’ll wager that there will be many more FlexPod deployments in 12 months time”
“@DanMoz 63 in production or deployment according to the figures I saw. But you are right, the concept has been sold well.”
Thanks for pointing out the inaccuracy of these statements (though inferring that I’m dumb or a liar is a little harsh), and I fully recant/withdraw the comment and apologise for the dumb error, both here, on twitter where I made the statement, and on my own blog. I will be more careful in the future.
<Request removed .. JM >
Let the truth prevail.
Regards John Martin
On a third reading of this, even my apology was inaccurate, I claimed 63 vBlocks had been sold, not 62 … d’oh ! Time to get more sleep and up my game.