This will be a series of blog posts that will try to help you establish a consistent storage benchmarking methodology.
Storage is an area that I have focused on for much of my career, I’ve been fortunate to be involved in a lot of very challenging and fun storage projects over the years. I spent a good period of time in the trenches, either for storage manufacturers or a reseller, performing professional services for custom storage implementations. It is common place for apprehension on any changes in storage systems, if that is vendor, disk types, RAID type and grouping of disks, and more. Storage is a complicated (and expensive) beast, it is still one of the most expensive (and profitable) components within the entire datacenter.
Virtualization changed the world for IT, it is one of the most disruptive single concepts that truly changed how IT does business. I know I don’t have to give history lessons on the impact this has had for the hardware vendors, but it is apparent it was mostly negative for server and network vendors and extremely positive for storage vendors. Prior to virtualization being common place storage area networks (SANs), or other shared storage, was not overly common…when it did exist it was for niche applications that represented a very small part of the services an IT department managed. With virtualization, and really all of the great things that came with VMware’s virtualization (e.g. high availability, Vmotion, etc), shared storage became a fundamental component in all datacenters. Shared storage, if it be fiber channel, iSCSI, or NFS, went from representing a fraction of the data storage in a company to addressing essentially all data stored.
Storage has become more critical today than it ever was, we have optimized and streamlined all other aspects of the IT platform…however most storage vendors are still building products that are based on ancient principles, changes are complex and high risk. It is critical that you have a methodology to actually assess any changes to your storage environment to determine if it is a net gain or loss for your business, and storage performance directly impacts the success of the business. The challenge is that storage benchmarking is really hard, it is a monumental task to take on and actually do it well. Various tools have been used to try and do point performance tests, however they are not adequate for assessing replacing an entire storage platform or changing the configuration of an existing one.
Now that I have a long winded introduction lets start looking at what it takes to be successful in storage benchmarking. Like any initiative it needs to be an actual project with a process, while you can run and download Iometer and run a test within minutes…does that test actually mean anything? Does it correlate to anything other than another similar run of Iometer? Likely not.
I happen to focus on storage for a growing public cloud provider, I have spent a significant portion of time over the past 3+ years benchmarking storage platforms. I have tried various tools to try to assess a rating for storage systems that are under evaluation, and what is important to me may not be important to you. You need to determine what are the critical aspects for your business, here are the primary areas that I score systems on, in no particular order:
- Durability & Security
That is a lot of abilities, so I will break down what I mean in each one into what I am referring to. You will need to determine what are the specific requirements within each of the scoring areas are for your use case, as nothing is valid if it isn’t within the context of your use case.
Reliability is a pretty critical for my use case, and this extends to how predictable are all of the other areas that we are evaluating. When a component fails, do you have a consistent outcome? Does performance become unpredictable? Does the “useable” capacity that is reported change after a failure? There are a lot of gotchas, and the key is to know and be able to predict them as you must factor them into your implementation plan for the solution. This is an area that isn’t entirely objective as you just can’t test all conditions and context is everything.
This is a measure of resiliency, fault tolerance, survivability or in other words how consistent can you access my data..even during failure conditions. This is where storage vendors fit their dog-and-pony show in the datacenter in, they pull this disk and that cable to “prove” that the data is available even after these failures. I won’t go into the specific tests that I use for this, you may trust your vendor or you may not…how this is assessed must be within the context of each specific storage system. How you test/prove availability for legacy architecture, scale-out architecture, or hyper converged architecture type storage systems can vary greatly.
Durability & Security
How confident are you that when written is the data that returned and, perhaps even more critical, that the data returned is the actual data that was written. Security is grouped with durability as there are two aspects of security, confidence that an unauthorized entity cannot modify or otherwise access my data. This primarily relates to checksums on written data (to set confident point of reference), scrubbing on stored data (to compare to point of reference), and repair of data (create new copy from parity or mirroring). This is nearly impossible to actually test for, it is something you must query your vendor about. There are many ways faults can be introduced that can cause loss of data integrity: bit rot, medium read error, controller/cable failures, and more. Most vendors address this through checksums, in fact it wasn’t long ago that many vendors used this as the primary differentiator between “enterprise” storage systems or not…and it is often assumed to be present in modern enterprise storage systems, but you may be surprised to find that it doesn’t exist in many popular solutions.
Security is often addressed through encryption, some vendors may claim that their self encrypting disks (SED) are all that is necessary, however if the keys are stored on the disk then the encryption does nothing more than provide rapid-erase (if the disk is still operational), such as before you send it for replacement or otherwise decommission it.
In proprietary systems you really have to trust your vendor, and ideally the vendor has 3rd party validation of their offering so that you aren’t just taking the word of an individual that may be more interested in closing the deal than you keeping your job or your company surviving the “what if”, when it does happen.
Another area that may be more subjective than not, as it is difficult to assess this without a great amount of actual experience using the particular system…so if you are looking at a new product you have to be subjective based on your exposure while evaluating, or try to find IDC, or other, rankings comparing the offering.
Ultimately, is it operational within your environmental boundaries, both physical, staffing (expertise), etc. Does it integrate seamlessly with any existing processes or tools that you depend on (e.g. monitoring and alerting systems)? Can you or your staff adequately manage the system during a time of crisis (you know, 6am on a holiday when something horrific happens)? Having logical and intuitive interfaces is a big differentiator here, or do you refer to documentation anytime you try to manage the system?
Scalability is absolutely critical for my use case, and it actually is a comprehensive topic that addresses all of the other abilities and performance. Does reliability of the system decrease, maintain or increase with scale? Does the risk for data loss increase or decrease with scale? Can you sustainably operate 100s or 1000s of these systems? Do you need 100 operators to manage 1000 systems?
This is the area that I will devote other posts to, as this is where things get more complex. Vendors typically have comparison between their solution and others that cover the other topics, however reference performance benchmarks are just that, a reference. Any benchmark is only valid as a comparison to another benchmark executed the same way using the same workload, if that is a synthetic benchmark or not…and if all instances being compared were setup consistently to the respective vendors “best” practices.
To be continued…
2 thoughts on “Storage Benchmarking (part 1 of…)”
Indeed close to the function(s) that a storage array need to offer there are many non-functional aspects that I like to refer as to ‘qualities’, storage arrays need to come up with as well.
Your list of qualities are quite complete but I feel you’re mixing and matching some of the terms.
Availability indicates the ability of a technology to achieve highly available operation. There is a characteristic close to availability that one should understand as well. i.e. reliability, which is a measure of the percentage uptime considering the downtime due only to faults, should also be factored in to help differentiate from planned and unplanned downtimes for instance. Key metrics e.g. % uptime
Recoverability, which is not in your list, indicates the ability to recover from an unexpected incident which affects the availability of a storage array. There is another characteristic close to recoverability that one should understand as well i.e. resiliency, which is the characteristic of being able to adapt under stress or faults in order to avoid failure either by maintaining full storage array service(s)/performance or by failing over gracefully and therefore reducing storage array service(s) or performance, should also be factored in. One way to reach higher resiliency of a service is by making it redundant e.g. N+1 power supplies. Key Metrics are usually RTO/RPO.
Thanks for the comment, honestly I was surprised someone had read it 😉
Perhaps I will circle back and go into more detail on some of those items, but so many of those aspects are very specific to a particular storage offering. In many cases documentation alone is not even adequate to create a valid design for all use cases; you really have to test and test and test until you actually know how a specific product works. Unlike the old legacy storage systems the test/design that works for one may not work for another as truly software oriented storage has a lot more variability. An EMC VNX and a NetApp FAS have far more in common from designing for availability/resiliency/recoverability than does VSAN and ScaleIO (as an example), so you can leverage slightly modified implementation designs to achieve the same outcome on those legacy platforms.
I agree I am oversimplifying, but its because I wanted to get to the performance assessment part. Perhaps in the future I can write up how I would do it, in the hypothetical, for 2 of the “new architecture” offerings (e.g. VSAN or ScaleIO)…of course the challenge there is that my design may be driven by knowledge based on NDA info.