Category Archives: Storage

Storage Benchmarking (part 1 of…)

This will be a series of blog posts that will try to help you establish a consistent storage benchmarking methodology.

Storage is an area that I have focused on for much of my career, I’ve been fortunate to be involved in a lot of very challenging and fun storage projects over the years.  I spent a good period of time in the trenches, either for storage manufacturers or a reseller, performing professional services for custom storage implementations.  It is common place for apprehension on any changes in storage systems, if that is vendor, disk types, RAID type and grouping of disks, and more.  Storage is a complicated (and expensive) beast, it is still one of the most expensive (and profitable) components within the entire datacenter.

Virtualization changed the world for IT, it is one of the most disruptive single concepts that truly changed how IT does business.  I know I don’t have to give history lessons on the impact this has had for the hardware vendors, but it is apparent it was mostly negative for server and network vendors and extremely positive for storage vendors. Prior to virtualization being common place storage area networks (SANs), or other shared storage, was not overly common…when it did exist it was for niche applications that represented a very small part of the services an IT department managed.  With virtualization, and really all of the great things that came with VMware’s virtualization (e.g. high availability, Vmotion, etc), shared storage became a fundamental component in all datacenters.  Shared storage, if it be fiber channel, iSCSI, or NFS, went from representing a fraction of the data storage in a company to addressing essentially all data stored.

Storage has become more critical today than it ever was, we have optimized and streamlined all other aspects of the IT platform…however most storage vendors are still building products that are based on ancient principles, changes are complex and high risk.  It is critical that you have a methodology to actually assess any changes to your storage environment to determine if it is a net gain or loss for your business, and storage performance directly impacts the success of the business.  The challenge is that storage benchmarking is really hard, it is a monumental task to take on and actually do it well.  Various tools have been used to try and do point performance tests, however they are not adequate for assessing replacing an entire storage platform or changing the configuration of an existing one.

Now that I have a long winded introduction lets start looking at what it takes to be successful in storage benchmarking.  Like any initiative it needs to be an actual project with a process, while you can run and download Iometer and run a test within minutes…does that test actually mean anything?  Does it correlate to anything other than another similar run of Iometer?  Likely not.

I happen to focus on storage for a growing public cloud provider, I have spent a significant portion of time over the past 3+ years benchmarking storage platforms.  I have tried various tools to try to assess a rating for storage systems that are under evaluation, and what is important to me may not be important to you.  You need to determine what are the critical aspects for your business, here are the primary areas that I score systems on, in no particular order:

  • Reliability
  • Availability
  • Durability & Security
  • Sustainability
  • Scalability
  • Performance

That is a lot of abilities, so I will break down what I mean in each one into what I am referring to.  You will need to determine what are the specific requirements within each of the scoring areas are for your use case, as nothing is valid if it isn’t within the context of your use case.


Reliability is a pretty critical for my use case, and this extends to how predictable are all of the other areas that we are evaluating.  When a component fails, do you have a consistent outcome?  Does performance become unpredictable?  Does the “useable” capacity that is reported change after a failure?  There are a lot of gotchas, and the key is to know and be able to predict them as you must factor them into your implementation plan for the solution.  This is an area that isn’t entirely objective as you just can’t test all conditions and context is everything.


This is a measure of resiliency, fault tolerance, survivability or in other words how consistent can you access my data..even during failure conditions.  This is where storage vendors fit their dog-and-pony show in the datacenter in, they pull this disk and that cable to “prove” that the data is available even after these failures.  I won’t go into the specific tests that I use for this, you may trust your vendor or you may not…how this is assessed must be within the context of each specific storage system.  How you test/prove availability for legacy architecture, scale-out architecture, or hyper converged architecture type storage systems can vary greatly.

Durability & Security

How confident are you that when written is the data that returned and, perhaps even more critical, that the data returned is the actual data that was written.  Security is grouped with durability as there are two aspects of security, confidence that an unauthorized entity cannot modify or otherwise access my data.  This primarily relates to checksums on written data (to set confident point of reference), scrubbing on stored data (to compare to point of reference), and repair of data (create new copy from parity or mirroring).  This is nearly impossible to actually test for, it is something you must query your vendor about.  There are many ways faults can be introduced that can cause loss of data integrity: bit rot, medium read error, controller/cable failures, and more.  Most vendors address this through checksums, in fact it wasn’t long ago that many vendors used this as the primary differentiator between “enterprise” storage systems or not…and it is often assumed to be present in modern enterprise storage systems, but you may be surprised to find that it doesn’t exist in many popular solutions.

Security is often addressed through encryption, some vendors may claim that their self encrypting disks (SED) are all that is necessary, however if the keys are stored on the disk then the encryption does nothing more than provide rapid-erase (if the disk is still operational), such as before you send it for replacement or otherwise decommission it.

In proprietary systems you really have to trust your vendor, and ideally the vendor has 3rd party validation of their offering so that you aren’t just taking the word of an individual that may be more interested in closing the deal than you keeping your job or your company surviving the “what if”, when it does happen.


Another area that may be more subjective than not, as it is difficult to assess this without a great amount of actual experience using the particular system…so if you are looking at a new product you have to be subjective based on your exposure while evaluating, or try to find IDC, or other, rankings comparing the offering.

Ultimately, is it operational within your environmental boundaries, both physical, staffing (expertise), etc.  Does it integrate seamlessly with any existing processes or tools that you depend on (e.g. monitoring and alerting systems)?  Can you or your staff adequately manage the system during a time of crisis (you know, 6am on a holiday when something horrific happens)?  Having logical and intuitive interfaces is a big differentiator here, or do you refer to documentation anytime you try to manage the system?


Scalability is absolutely critical for my use case, and it actually is a comprehensive topic that addresses all of the other abilities and performance.  Does reliability of the system decrease, maintain or increase with scale?  Does the risk for data loss increase or decrease with scale?  Can you sustainably operate 100s or 1000s of these systems?  Do you need 100 operators to manage 1000 systems?


This is the area that I will devote other posts to, as this is where things get more complex.  Vendors typically have comparison between their solution and others that cover the other topics, however reference performance benchmarks are just that, a reference.  Any benchmark is only valid as a comparison to another benchmark executed the same way using the same workload, if that is a synthetic benchmark or not…and if all instances being compared were setup consistently to the respective vendors “best” practices.

To be continued…

Tagged , , ,

I have been doing testing various hyper converged storage platforms that can coexist with ESX, along with some bare metal software storage platforms.  In all cases I am using embedded RAID controllers in the servers, in some cases I using some add-on cards.  I have two cards in use currently, one is some Intel flashed LSI card and the other are SuperMicro LSI 2208 that is embedded in the FAT Twin.  While in all of these cases you can use single-disk RAID0 logical volumes, doing so adds a lot of extra steps and in many of my systems it offers no gain.

WARNING:  Proceed at your own risk, I recommend verifying that no data will be impacted by this task.  I also encourage you to confirm that the JBOD (aka pass-through mode) configuration is supported with your hardware and your storage platform.

It is possible that you can do some of these steps with getting into the boot BIOS, however in the case of the Intel flashed LSI cards the boot BIOS is really horrible.  I spent an hour trying to navigate the BIOS over remote console via the Intel Remote Management Module…but it was absolutely painful and the only thing that worked was using the wizard, which created undesirable configurations.  I ended up working around this by doing the following steps:

  1. Download a live boot CD Linux image
  2. Connect ISO to server through virtual media insertion of remote console
  3. Boot Linux image
  4. Configure networking on Linux
  5. Download MegaCLI to local workstation, then SCP it to the Linux machine
  6. Install MegaCLI
  7. Run MegaCLI commands

In more detail:

I downloaded MegaCLI and placed it on my Dropbox folder, this made it easy so I could just use wget on the Linux server after it booted.  Once Linux was booted I configured an IP address onto my appropriate network interface using ifconfig statement, added DNS to resolve.conf, and a default gateway.  I then could SSH in where I had copy and paste to just run the same commands quickly across my dozen hosts.  In my case I selected the CentOS 6.5 LiveCD from a close by mirror, but you should be able to use any Linux bootable CD that is of a more recent build.

I will warn that doing these steps with any data in place will absolutely lead to data destruction.  I am not liable for how quickly the -CfgLdDel command obliterates any existing logical volume configuration, proceed at your own risk.

Continue reading

Switching LSI SAS 2208 and similar chipsets to JBOD mode

Tagged , , , ,

Home Lab – Storage Performance Test (Part 1)

This is a continuation of my Home Lab Build – Overview

In order to help out my fellow vGeeks I thought I should keep with my “comparison” to the hardware storage appliances. While I personally won’t be running my system as a 4-disk configuration, I realize that some of them may. I ran some tests using Bonnie++ benchmarking and dd from /dev/zero to provide some benchmarks, I realize that these will not be 100% representative of the performance that would be experienced with protocols in place however it should provide a relative comparison between the disk configuration options.

I have chosen to use Bonnie++ for my benchmarks as it is a far faster setup, it operates from within the management console of Nexenta. If you are not familiar with Bonnie++ and how it performs testing you can find more info here:

I will run three tests using Bonnie++, only varying the block size between 4KB, 8KB, and 32KB.

  • run benchmark bonnie-benchmark -p2 -b 4k
  • run benchmark bonnie-benchmark -p2 -b 8k
  • run benchmark bonnie-benchmark -p2 -b 32k

Each test will be performed against each of the following storage pool configurations:

  • RAID0 (no disk failure protection)
  • RAIDZ1 (single disk failure protection, similar to RAID5)
  • RAIDZ2 (double disk failure protection, similar to RAID6) *this configuration will only be tested with 4K block sizes to display the parity tax*

I will run a few “hardware” variations, my target configuration with 2 vCPU and 4-6GB RAM as well as a reduced configuration with 2vCPU and 2GB of RAM. I expect the decrease in RAM to mostly decrease read performance as it will reduce the working cache size.

I intended to have the time to create some lovely graphs to simplify the process of comparing the results of each test, however I could either wait another week or two before finding time or I should share the results in the output format from Bonnie++. In order to get this info to my fellow vGeeks, I have decided to publish the less-than-pretty format, after all, any real geek prefers unformatted text to PowerPoint and glossy sales docs.

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAID0

================== 4k Blocks ==================
157MB/s 42% 84MB/s 32% 211MB/s 28% 1417/sec
156MB/s 41% 83MB/s 32% 208MB/s 28% 1579/sec
——— —- ——— —- ——— —- ———
314MB/s 41% 168MB/s 32% 420MB/s 28% 1498/sec

================== 8k Blocks ==================
148MB/s 22% 92MB/s 20% 212MB/s 20% 685/sec
147MB/s 21% 90MB/s 20% 212MB/s 21% 690/sec
——— —- ——— —- ——— —- ———
295MB/s 21% 182MB/s 20% 424MB/s 20% 688/sec

================== 32k Blocks ==================
144MB/s 12% 90MB/s 11% 210MB/s 14% 297/sec
153MB/s 12% 92MB/s 12% 210MB/s 15% 295/sec
——— —- ——— —- ——— —- ———
298MB/s 12% 183MB/s 11% 420MB/s 14% 296/sec

Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAID0

================== 4k Blocks ==================
113MB/s 21% 75MB/s 22% 216MB/s 31% 980/sec
113MB/s 21% 74MB/s 22% 217MB/s 31% 936/sec
——— —- ——— —- ——— —- ———
227MB/s 21% 150MB/s 22% 434MB/s 31% 958/sec

================== 8k Blocks ==================
110MB/s 13% 80MB/s 15% 209MB/s 22% 521/sec
110MB/s 13% 80MB/s 15% 210MB/s 23% 524/sec
——— —- ——— —- ——— —- ———
220MB/s 13% 161MB/s 15% 420MB/s 22% 523/sec

================== 32k Blocks ==================
114MB/s 8% 81MB/s 9% 218MB/s 13% 297/sec
113MB/s 8% 79MB/s 9% 218MB/s 12% 294/sec
——— —- ——— —- ——— —- ———
228MB/s 8% 161MB/s 9% 436MB/s 12% 296/sec

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAID1+0 (2 x 1+1 mirrors)

================== 4k Blocks ==================
89MB/s 27% 53MB/s 19% 143MB/s 22% 1657/sec
89MB/s 27% 53MB/s 19% 144MB/s 22% 1423/sec
——— —- ——— —- ——— —- ———
178MB/s 27% 106MB/s 19% 288MB/s 22% 1540/sec

================== 8k Blocks ==================
83MB/s 13% 53MB/s 12% 147MB/s 16% 800/sec
83MB/s 12% 54MB/s 12% 147MB/s 16% 752/sec
——— —- ——— —- ——— —- ———
167MB/s 12% 107MB/s 12% 294MB/s 16% 776/sec

================== 32k Blocks ==================
85MB/s 7% 55MB/s 7% 141MB/s 9% 277/sec
82MB/s 7% 53MB/s 7% 135MB/s 9% 266/sec
——— —- ——— —- ——— —- ———
167MB/s 7% 109MB/s 7% 276MB/s 9% 271/sec

Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAID1+0 (2 x 1+1 mirrors)

================== 4k Blocks ==================
65MB/s 11% 48MB/s 14% 154MB/s 22% 892/sec
65MB/s 11% 48MB/s 13% 152MB/s 22% 786/sec
——— —- ——— —- ——— —- ———
130MB/s 11% 97MB/s 13% 306MB/s 22% 839/sec

================== 8k Blocks ==================
67MB/s 7% 47MB/s 9% 157MB/s 14% 669/sec
67MB/s 7% 47MB/s 9% 155MB/s 14% 637/sec
——— —- ——— —- ——— —- ———
135MB/s 7% 94MB/s 9% 313MB/s 14% 653/sec

================== 32k Blocks ==================
68MB/s 5% 31MB/s 3% 153MB/s 8% 338/sec
68MB/s 5% 31MB/s 3% 151MB/s 8% 342/sec
——— —- ——— —- ——— —- ———
136MB/s 5% 62MB/s 3% 304MB/s 8% 340/sec

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAIDZ1 (RAID5)

================== 4k Blocks ==================
109MB/s 30% 54MB/s 22% 133MB/s 21% 813/sec
108MB/s 32% 54MB/s 22% 131MB/s 20% 708/sec
——— —- ——— —- ——— —- ———
218MB/s 31% 108MB/s 22% 265MB/s 20% 761/sec

================== 8k Blocks ==================
114MB/s 25% 60MB/s 17% 131MB/s 14% 525/sec
118MB/s 24% 60MB/s 18% 133MB/s 14% 517/sec
——— —- ——— —- ——— —- ———
232MB/s 24% 121MB/s 17% 265MB/s 14% 521/sec

================== 32k Blocks ==================
107MB/s 12% 60MB/s 8% 138MB/s 9% 163/sec
111MB/s 11% 60MB/s 8% 138MB/s 9% 172/sec
——— —- ——— —- ——— —- ———
218MB/s 11% 121MB/s 8% 276MB/s 9% 167/sec

Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAIDZ1 (RAID5)

================== 4k Blocks ==================
74MB/s 15% 40MB/s 12% 160MB/s 18% 715/sec
76MB/s 15% 41MB/s 13% 165MB/s 19% 651/sec
——— —- ——— —- ——— —- ———
151MB/s 15% 82MB/s 12% 325MB/s 18% 683/sec

================== 8k Blocks ==================
75MB/s 9% 42MB/s 8% 167MB/s 21% 384/sec
73MB/s 8% 42MB/s 8% 166MB/s 20% 387/sec
——— —- ——— —- ——— —- ———
149MB/s 8% 85MB/s 8% 333MB/s 20% 386/sec

================== 32k Blocks ==================
73MB/s 5% 41MB/s 4% 168MB/s 11% 182/sec
71MB/s 5% 40MB/s 4% 168MB/s 11% 183/sec
——— —- ——— —- ——— —- ———
144MB/s 5% 82MB/s 4% 337MB/s 11% 182/sec

Hardware Variation 3 (2vCPU/8GB RAM) / 4-disk RAIDZ1 (RAID5)

================== 4k Blocks ==================
114MB/s 34% 58MB/s 22% 146MB/s 22% 872/sec
114MB/s 34% 59MB/s 23% 147MB/s 21% 693/sec
——— —- ——— —- ——— —- ———
228MB/s 34% 118MB/s 22% 293MB/s 21% 783/sec

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAIDZ2 (RAID6)

================== 4k Blocks ==================
71MB/s 20% 43MB/s 16% 111MB/s 20% 716/sec
71MB/s 20% 43MB/s 16% 110MB/s 20% 677/sec
——— —- ——— —- ——— —- ———
143MB/s 20% 86MB/s 16% 221MB/s 20% 696/sec

================== 8k Blocks ==================
75MB/s 13% 42MB/s 10% 110MB/s 12% 540/sec
74MB/s 16% 42MB/s 11% 104MB/s 11% 491/sec
——— —- ——— —- ——— —- ———
149MB/s 14% 84MB/s 10% 215MB/s 11% 515/sec

================== 32k Blocks ==================
70MB/s 7% 43MB/s 6% 109MB/s 8% 202/sec
70MB/s 7% 42MB/s 6% 109MB/s 8% 203/sec
——— —- ——— —- ——— —- ———
140MB/s 7% 85MB/s 6% 218MB/s 8% 203/sec

I have to admit, I was incorrect in my prediction that the RAM size would more directly correlate to read performance…it actually seems that increasing the RAM somehow leads to a slight decrease in read performance, while improving write performance. I am going to speculate this has to do with poor caching algorithms, or at least poor for this workload, as well as ZIL being performed in RAM. The larger RAM leads to increased L2ARC (cache) side, this does improve random-seeks significantly but decreases max read throughout (large block) due to the L2ARC leading to inaccurate predictive reads (speculation).

Much like a NetApp storage systems, writes are always attempted to be done in large chunks…if you actually were to watch the iostat output for the physical devices you would see that it is very much peaks and valleys for writes to the physical media, even though the incoming workload is steady state. NetApp and ZFS both attempt to play a form of tetris in order to make the most efficient write possible, the more RAM available the better it can stage these writes to complete efficiently.

One key measure is the actual throughput per given disk, RAID0 is a good way to determine this. If we look at the results we have the following metrics, as expected re-writes always suffer as they do in any file system. We will focus on writes, reads and random-seeks and I will use the numbers from the lowest memory configuration for RAID0:

  • Writes: 227MB/s
  • Reads: 434MB/s
  • Random Seeks: 958/sec

Now we need to break this into per-disk statistics, which is simply dividing the above value by the number of physical disk.

  • Writes: 56.75MB/sec/disk
  • Reads: 108.5MB/sec/disk
  • Random Seeks: 239.5 IOPS/disk

Of course, we can see that the one flaw in Bonnie++ is that we do not have latency statistics. We normally expect a 7200 RPM SATA disk to offer 40-60 IOPS with sub-20 millisecond response, I have no measure of the response time being experienced during this test or how much of the random seeks were against cache. I selected the lowest RAM (cache) configuration to try and minimize that in our equation.

We can then use this as a baseline to measure the degradation in each protection scheme on a per-data disk basis. In a RAID1+0 configuration we have 2 disks supporting writes, and 4-disks supporting reads and this leads to our reasonable performance for reads. The reason my lab is operating in a RAID1+0 configuration is that my environment is heavily read oriented, and with the low number of physical disks I did not want the parity write-tax in addition with 6 1TB SATA drives I am not capacity restricted.

I almost went into a full interpretation of my results, however I stumbled upon this site in my research: You will find a detailed description into the performance expectations of each RAID configuration, the telling portion is this:

Blocks Available
Random FS Blocks / sec
(N – 1) * dev
1 * dev
(N / 2) * dev
N * dev
N * dev
N * dev

The key item to interpret is that with RAID-Z, the random IOPS are limited to a single device. You will see in the referenced blog posting that a configuration of multiple small RAID-Z groups performs better than a large RAID-Z group, as each group would have 1-device supporting the random workload. This may not be 100% in correlation with RAID5, or whatever RAID scheme your storage platform uses as they are not all created equal.

Tagged ,

Home Lab Build – NexentaStor Setup

NOTE: This installation was performed with NexentaStor 3.0.4, later versions may have slight differences in the installation process and the GUI interface.

I’m going to skip on insulting your intelligence by providing screen shots of the installation process for Nexenta, or the configuration of the VM if you go that route. I will start with the assumption that you have NexentaStor (Community Edition) installed on either a physical system or a VM, if you have gone the physical route obviously your network interface names are going to be different than I show. Since I am using VT-d of an actual SAS controller card, the rest should be similar.

  1. Proceed and start the configuration wizard


  2. Select which detected network interface you wish to be your primary (management) – we get more advanced control after the wizard is complete


  3. Select your configuration option (static)


  4. Input your IP Address you wish to use


  5. Proceed through the network configuration defining your subnet mask, DNS servers and gateway


  6. Review your configuration settings. If your configuration is correct, select N(o). If you need to make a correction, select Y(es)


  7. Select if you wish to use HTTP or HTTPS for management access. SSL does add CPU overhead and may be less responsive as the system warns.


  8. Make note of your configured TCP port and change it if desired (default = 2000), this will be the port the web management GUI listens on.


  9. Make note of the provided URL and access it in order to continue configuration.


  10. Open the management GUI in a web browser (Flash enabled) to proceed with the configuration wizard (Wizard 1).


  11. Populate the fields to meet your configuration goals and proceed to the next step.


  12. Configure your passwords for the two default management contexts and proceed.


  13. Define your notification preferences and continue to the next step.


  14. Review your configuration settings and save your configuration.
  15. We are now into the “Wizard 2” stage, this is where we will configure the actual storage options.
  16. Review your current interface settings, you can edit the existing configuration or add a new one. If you wish to aggregate multiple links into a single logical interface you must add a new interface to get that option. I will leave these as they are and can edit them at another time.
  17. Next we are prompted to configure the iSCSI initiator service, this would be used to access another storage device for resources (e.g. to add NFS to an iSCSI only system such as a Dell/EQ). I am not using any other iSCSI systems so this is irrelevant.

  18. This next screen shows us the list of detected disk devices, if you had configured iSCSI on the previous screen and had mapped storage to this initiator those resources should also be visible. I currently have 2 1TB Seagate drives attached to the SATA controller I assigned through VT-d.

  19. In this next section we are asked to create volumes (storage pools). The process is to select the physical resources and assign it to the volume. You can select multiple devices and change the “Redundancy Type” to configure for RAID protection (None=stripe, Mirror, RAIDZ1 = ~RAID5, RAIDZ2 = ~RAID6, and RAIDZ3 = paranoid?) 

I am starting with “none” as I will perform some testing comparing different options in a later post.

  20. In the lower section we configure the properties of the pool, including name, deduplication, and Sync settings (which we will discuss more later). I will leave all settings as default at this time.

  21. Verify your volume was created, if not a red error description will flash temporarily across the upper section of the screen.

  22. In this next portion we can create “folders”, each folder can have its own access type (NFS, CIFS, FTP, RSYNC, etc). I will add a single folder which I will configure for NFS, I am selecting a block size of 4KB to match that of most of my guest OS systems. I also am setting the file system to be case sensitive and to enable unicode.
  23. This is the final step of the guided wizard, we can make any additional changes through the actual management interface. Set the checkboxes to meet your comfort level, I will attempt to compare some of these options in a later post for performance impact.
  24. This completes the basic configuration, the rest will be done through the standard management interface.

NexentaStor Storage Concepts

Within the Nexenta, or perhaps Solaris ZFS storage management, there are :

  • Datasets (ZFS Pools) which are made up of physical disk (or logical from outside array)
  • Shares – logical units presented as file services (CIFS, NFS, RSYNC, FTP, etc)
  • ZVols – logical units presented as block storage (iSCSI)

With that being said, there are really just 2 different processes for allocating storage depending on if it is file based or block based storage.

Again, I hope this helps someone. I will cover configuring storage and accessing it from ESX in a later post.

Tagged ,