This is a continuation of my Home Lab Build – Overview
In order to help out my fellow vGeeks I thought I should keep with my “comparison” to the hardware storage appliances. While I personally won’t be running my system as a 4-disk configuration, I realize that some of them may. I ran some tests using Bonnie++ benchmarking and dd from /dev/zero to provide some benchmarks, I realize that these will not be 100% representative of the performance that would be experienced with protocols in place however it should provide a relative comparison between the disk configuration options.
I have chosen to use Bonnie++ for my benchmarks as it is a far faster setup, it operates from within the management console of Nexenta. If you are not familiar with Bonnie++ and how it performs testing you can find more info here: http://www.coker.com.au/bonnie++/readme.html
I will run three tests using Bonnie++, only varying the block size between 4KB, 8KB, and 32KB.
- run benchmark bonnie-benchmark -p2 -b 4k
- run benchmark bonnie-benchmark -p2 -b 8k
- run benchmark bonnie-benchmark -p2 -b 32k
Each test will be performed against each of the following storage pool configurations:
- RAID0 (no disk failure protection)
- RAIDZ1 (single disk failure protection, similar to RAID5)
- RAIDZ2 (double disk failure protection, similar to RAID6) *this configuration will only be tested with 4K block sizes to display the parity tax*
I will run a few “hardware” variations, my target configuration with 2 vCPU and 4-6GB RAM as well as a reduced configuration with 2vCPU and 2GB of RAM. I expect the decrease in RAM to mostly decrease read performance as it will reduce the working cache size.
I intended to have the time to create some lovely graphs to simplify the process of comparing the results of each test, however I could either wait another week or two before finding time or I should share the results in the output format from Bonnie++. In order to get this info to my fellow vGeeks, I have decided to publish the less-than-pretty format, after all, any real geek prefers unformatted text to PowerPoint and glossy sales docs.
Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAID0
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
157MB/s 42% 84MB/s 32% 211MB/s 28% 1417/sec
156MB/s 41% 83MB/s 32% 208MB/s 28% 1579/sec
——— —- ——— —- ——— —- ———
314MB/s 41% 168MB/s 32% 420MB/s 28% 1498/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
148MB/s 22% 92MB/s 20% 212MB/s 20% 685/sec
147MB/s 21% 90MB/s 20% 212MB/s 21% 690/sec
——— —- ——— —- ——— —- ———
295MB/s 21% 182MB/s 20% 424MB/s 20% 688/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
144MB/s 12% 90MB/s 11% 210MB/s 14% 297/sec
153MB/s 12% 92MB/s 12% 210MB/s 15% 295/sec
——— —- ——— —- ——— —- ———
298MB/s 12% 183MB/s 11% 420MB/s 14% 296/sec
Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAID0
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
113MB/s 21% 75MB/s 22% 216MB/s 31% 980/sec
113MB/s 21% 74MB/s 22% 217MB/s 31% 936/sec
——— —- ——— —- ——— —- ———
227MB/s 21% 150MB/s 22% 434MB/s 31% 958/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
110MB/s 13% 80MB/s 15% 209MB/s 22% 521/sec
110MB/s 13% 80MB/s 15% 210MB/s 23% 524/sec
——— —- ——— —- ——— —- ———
220MB/s 13% 161MB/s 15% 420MB/s 22% 523/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
114MB/s 8% 81MB/s 9% 218MB/s 13% 297/sec
113MB/s 8% 79MB/s 9% 218MB/s 12% 294/sec
——— —- ——— —- ——— —- ———
228MB/s 8% 161MB/s 9% 436MB/s 12% 296/sec
Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAID1+0 (2 x 1+1 mirrors)
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
89MB/s 27% 53MB/s 19% 143MB/s 22% 1657/sec
89MB/s 27% 53MB/s 19% 144MB/s 22% 1423/sec
——— —- ——— —- ——— —- ———
178MB/s 27% 106MB/s 19% 288MB/s 22% 1540/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
83MB/s 13% 53MB/s 12% 147MB/s 16% 800/sec
83MB/s 12% 54MB/s 12% 147MB/s 16% 752/sec
——— —- ——— —- ——— —- ———
167MB/s 12% 107MB/s 12% 294MB/s 16% 776/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
85MB/s 7% 55MB/s 7% 141MB/s 9% 277/sec
82MB/s 7% 53MB/s 7% 135MB/s 9% 266/sec
——— —- ——— —- ——— —- ———
167MB/s 7% 109MB/s 7% 276MB/s 9% 271/sec
Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAID1+0 (2 x 1+1 mirrors)
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
65MB/s 11% 48MB/s 14% 154MB/s 22% 892/sec
65MB/s 11% 48MB/s 13% 152MB/s 22% 786/sec
——— —- ——— —- ——— —- ———
130MB/s 11% 97MB/s 13% 306MB/s 22% 839/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
67MB/s 7% 47MB/s 9% 157MB/s 14% 669/sec
67MB/s 7% 47MB/s 9% 155MB/s 14% 637/sec
——— —- ——— —- ——— —- ———
135MB/s 7% 94MB/s 9% 313MB/s 14% 653/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
68MB/s 5% 31MB/s 3% 153MB/s 8% 338/sec
68MB/s 5% 31MB/s 3% 151MB/s 8% 342/sec
——— —- ——— —- ——— —- ———
136MB/s 5% 62MB/s 3% 304MB/s 8% 340/sec
Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAIDZ1 (RAID5)
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
109MB/s 30% 54MB/s 22% 133MB/s 21% 813/sec
108MB/s 32% 54MB/s 22% 131MB/s 20% 708/sec
——— —- ——— —- ——— —- ———
218MB/s 31% 108MB/s 22% 265MB/s 20% 761/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
114MB/s 25% 60MB/s 17% 131MB/s 14% 525/sec
118MB/s 24% 60MB/s 18% 133MB/s 14% 517/sec
——— —- ——— —- ——— —- ———
232MB/s 24% 121MB/s 17% 265MB/s 14% 521/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
107MB/s 12% 60MB/s 8% 138MB/s 9% 163/sec
111MB/s 11% 60MB/s 8% 138MB/s 9% 172/sec
——— —- ——— —- ——— —- ———
218MB/s 11% 121MB/s 8% 276MB/s 9% 167/sec
Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAIDZ1 (RAID5)
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
74MB/s 15% 40MB/s 12% 160MB/s 18% 715/sec
76MB/s 15% 41MB/s 13% 165MB/s 19% 651/sec
——— —- ——— —- ——— —- ———
151MB/s 15% 82MB/s 12% 325MB/s 18% 683/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
75MB/s 9% 42MB/s 8% 167MB/s 21% 384/sec
73MB/s 8% 42MB/s 8% 166MB/s 20% 387/sec
——— —- ——— —- ——— —- ———
149MB/s 8% 85MB/s 8% 333MB/s 20% 386/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
73MB/s 5% 41MB/s 4% 168MB/s 11% 182/sec
71MB/s 5% 40MB/s 4% 168MB/s 11% 183/sec
——— —- ——— —- ——— —- ———
144MB/s 5% 82MB/s 4% 337MB/s 11% 182/sec
Hardware Variation 3 (2vCPU/8GB RAM) / 4-disk RAIDZ1 (RAID5)
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
114MB/s 34% 58MB/s 22% 146MB/s 22% 872/sec
114MB/s 34% 59MB/s 23% 147MB/s 21% 693/sec
——— —- ——— —- ——— —- ———
228MB/s 34% 118MB/s 22% 293MB/s 21% 783/sec
Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAIDZ2 (RAID6)
================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
71MB/s 20% 43MB/s 16% 111MB/s 20% 716/sec
71MB/s 20% 43MB/s 16% 110MB/s 20% 677/sec
——— —- ——— —- ——— —- ———
143MB/s 20% 86MB/s 16% 221MB/s 20% 696/sec
================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
75MB/s 13% 42MB/s 10% 110MB/s 12% 540/sec
74MB/s 16% 42MB/s 11% 104MB/s 11% 491/sec
——— —- ——— —- ——— —- ———
149MB/s 14% 84MB/s 10% 215MB/s 11% 515/sec
================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
70MB/s 7% 43MB/s 6% 109MB/s 8% 202/sec
70MB/s 7% 42MB/s 6% 109MB/s 8% 203/sec
——— —- ——— —- ——— —- ———
140MB/s 7% 85MB/s 6% 218MB/s 8% 203/sec
Summary
I have to admit, I was incorrect in my prediction that the RAM size would more directly correlate to read performance…it actually seems that increasing the RAM somehow leads to a slight decrease in read performance, while improving write performance. I am going to speculate this has to do with poor caching algorithms, or at least poor for this workload, as well as ZIL being performed in RAM. The larger RAM leads to increased L2ARC (cache) side, this does improve random-seeks significantly but decreases max read throughout (large block) due to the L2ARC leading to inaccurate predictive reads (speculation).
Much like a NetApp storage systems, writes are always attempted to be done in large chunks…if you actually were to watch the iostat output for the physical devices you would see that it is very much peaks and valleys for writes to the physical media, even though the incoming workload is steady state. NetApp and ZFS both attempt to play a form of tetris in order to make the most efficient write possible, the more RAM available the better it can stage these writes to complete efficiently.
One key measure is the actual throughput per given disk, RAID0 is a good way to determine this. If we look at the results we have the following metrics, as expected re-writes always suffer as they do in any file system. We will focus on writes, reads and random-seeks and I will use the numbers from the lowest memory configuration for RAID0:
- Writes: 227MB/s
- Reads: 434MB/s
- Random Seeks: 958/sec
Now we need to break this into per-disk statistics, which is simply dividing the above value by the number of physical disk.
- Writes: 56.75MB/sec/disk
- Reads: 108.5MB/sec/disk
- Random Seeks: 239.5 IOPS/disk
Of course, we can see that the one flaw in Bonnie++ is that we do not have latency statistics. We normally expect a 7200 RPM SATA disk to offer 40-60 IOPS with sub-20 millisecond response, I have no measure of the response time being experienced during this test or how much of the random seeks were against cache. I selected the lowest RAM (cache) configuration to try and minimize that in our equation.
We can then use this as a baseline to measure the degradation in each protection scheme on a per-data disk basis. In a RAID1+0 configuration we have 2 disks supporting writes, and 4-disks supporting reads and this leads to our reasonable performance for reads. The reason my lab is operating in a RAID1+0 configuration is that my environment is heavily read oriented, and with the low number of physical disks I did not want the parity write-tax in addition with 6 1TB SATA drives I am not capacity restricted.
I almost went into a full interpretation of my results, however I stumbled upon this site in my research: http://blogs.sun.com/roch/entry/when_to_and_not_to You will find a detailed description into the performance expectations of each RAID configuration, the telling portion is this:
Blocks Available
|
Random FS Blocks / sec
|
RAID-Z
|
(N – 1) * dev
|
1 * dev
|
Mirror
|
(N / 2) * dev
|
N * dev
|
Stripe
|
N * dev
|
N * dev
|
The key item to interpret is that with RAID-Z, the random IOPS are limited to a single device. You will see in the referenced blog posting that a configuration of multiple small RAID-Z groups performs better than a large RAID-Z group, as each group would have 1-device supporting the random workload. This may not be 100% in correlation with RAID5, or whatever RAID scheme your storage platform uses as they are not all created equal.