Functional Home Gigabit with Century Link

TL;DR (skip to the part you care about and not my rambling in boredom)

I’ve been using Comcast (Xfinity) for my home Internet service since 2003, prior to that I lived in a house that had multiple T1s (back when megabits of home Internet was very rare).  It is somewhat hard to imagine that in such a short period of time we went from hardwired home Internet being measured in kilobits to almost every mobile device we own being capable of sustaining 10s of megabits while roaming about.

I had been holding onto my Comcast Teleworker discounted ‘business’ Internet after leaving VMware, waiting for Google Fiber to come to town as Portland was supposed to be on the relatively near future roadmap and I was trying to avoid adding more unsightly aerial cabling to the exterior of my 110 year old house.  As neat as modern technology is, it doesn’t really go well with the architectural detail of an old craftsman home.  Since Google Fiber is now dead I decided to proceed with the next best option, Century Link.

I never thought I’d suggest that Century Link (formerly Qwest, formerly US West, aka US Worst) was a “best” option for anything.  I worked for large national ISPs for my early career, and US Worst was always one of the most problematic carriers to deal with.  I still have flashbacks about the escalations and yelling customers, but best was when their tech and manager didn’t realize they were connected to voicemail while planning how they were going to lie to explain way their fault on a prolonged outage impacting several of our customers.

Fast forward to today, I ordered Century Link Gigabit to be delivered to my house.  I had read many nightmare stories about this on Nextdoor but figured I’d go the lower risk route and order it online where I could have a paper trail, I tend to never sign up for a contract sold by a solicitor that knocks on my door.  The order went smoothly online, and amazingly they were able to install in less than a week later.  The tech arrived at the beginning of the instal window and spent much of the day running the fiber around our house to the only possible entry point.

What didn’t go well is that Century Link forces you to either buy or lease a “modem”, which is their name for a really crappy router.  The only thing special this “modem” does is it supports VLAN tagging on the WAN interface.  This router offers WiFi, but it only supports 802.11n at the fastest…you are reading correctly, you are required to buy a router that has a max wireless rate of around 100 megabit in order to buy gigabit service.

I had found a few blog posts online hinting at how to bypass their router by putting into “transparent bridge” mode, but I didn’t see any reason to even power this crappy device.  The tech hadn’t even finished cleaning up outside before I had converted back to using my Asus router, my 4-year old Asus readily blows away this brand new required POS.

How did I do it?  Its not so bad, there are a few blogs that you’d have to go to get all of the hints but they all leave out how to get the full thing working.  I was able to get better service using my own router than using the one provided, especially when you include IPv6 in the comparison.

TL;DR start here

I’m not going to include screen shots of all of the steps, as I would like to believe that anyone tackling this can figure it out from the high level steps (and I am too lazy to turn the CL router back on in order to document it).  In my case the CenturyLink 2100T  ZyXEL C1100Z was what was “sold” to me against my wishes.

I assume you know what cables to plug into where on your router and that you know you would need to move the WAN link that comes from the ONT from the Century Link router to your own, so I won’t include that detail here.  

I have Internet *only*, if you are also subscribing to PrismTV there may be additional settings required.

Collect PPPoE Details

  1. Login to the web interface of your Century Link router
  2. Skip to the advanced configuration section
  3. Find the remote management portion, enable telnet (likely the only time you will ever hear/see me suggest to use telnet) and set a password
  4. Telnet to your router IP (likely 192.168.0.1) and login as admin with your set password
  5. Type:
    sh
  6. Press enter, you are now in a  busybox shell.
  7. Run the command:
    /usr/bin/pidstat -l -C pppd
  8. You will get an output string that includes the runtime values being used too configure PPPoE, the parts you care about will look something like this:
    pppd -u lastfirst@qwest.net -p TXlQYXNzd29yZAo= -f 0 -M 1492 -D 0 -n 1 -L 0 -e 1 -X 120
  9. You just need to capture username and the encoded password, the username is the “lastnamefirstname@qwest.net” string and the password is the string after the -p, “TXlQYXNzd29yZAo=” in my example (be sure to include the entire string, including the equal sign as in my example)
  10. You can perform the next step natively on a Mac or you would need to use Linux, I use a Mac so it is easy.  Open a terminal window (aka shell) and run the following command to decide the password:
    echo TXlQYXNzd29yZAo= | base64 --decode
  11. You should get a decoded password back, like this:
    ~# echo TXlQYXNzd29yZAo= | base64 --decode
    
    MyPassword

Congratulations, you now have the PPP info to configure your personal router.  You can proceed to configuring PPPoE on your router WAN link, the only other thing you need to know is that you must tag the WAN with VLAN 201.  On my router’s 3rd party firmware this is under the settings for IPTV.

Now you just need to configure your router, I will include screen shots to help you on this portion.  Your settings may be called something different than what is shown, but there should be a functional equivalent.  If you do not have the ability to configure VLANs on your router you have two options, installed 3rd party firmware or just accept using the Century Link router in “transparent bridge mode” (as set on the WAN configuration under protocol settings).

Configure Your Router

On my Asus this is what I configured (obviously without quotes):

  1. WAN Connection Type: “PPPoE”
  2. PPPoE & MAN access: “DHCP or Static”
  3. Get MAN IP Automatically: “Enabled”
  4. PPP VPN Client Settings (PPPoE settings):
    1. Username: “lastnamefirstname@qwest.net”
    2. Password:  “MyPassword”
    3. Authentication Algorithm: “Auto”
    4. MTU: “1492”
    5. MRU: “1492”asus-pppoe-settings
  5. Ports Isolation and VLAN Filtering:
    1. Choose IPTV STB Port: “No”
    2. VLAN Tagged Traffic Filter: “Enabled”
    3. VLAN CPU (Internet): VID “201”, PRIO “0”
    4. VLAN CPU (IPTV):  defaults
      asus-vlan-settings

That should get you up and running on the Internet, however I wanted IPv6 support as I use it for some work projects.

Configure IPv6

I tried to guess at this but realized the best plan was to reconnect the Century Link router, go into the advanced settings and enable the IPv6 network features and capture the details for re-use.  I don’t know how generic these values are, some of them could be region specific or they may use any cast addresses allowing them to be universal.  Based on the Century Link support pages I assume these are universal.

Asus IPv6.png

You may need to reconnect your clients so that they get new DHCP info after making these changes, if you use static IPs on your workstations you will need to do your own magic to get them to also work with IPv6.  I use static IPv4 addresses on some devices, but just leave IPv6 configured for DHCP.

After making these changes I am able to score 19/20 on the IPv6 test, only lacking inverse DNS which I can’t do much about.  I did have to also enable “Respond Ping Request from WAN” on the firewall pages, as IPv6 requires more ICMP control messages than IPv4.

IPv6 Test Results.png

If you hit a wall you can drop a comment and I’ll try to fill in any details I missed.  If I end up swapping to a different router (e.g. something running pfSense) I will post an update, but the settings should be the same regardless it is just a matter of translating them to a specific configuration nomenclature.

Advertisement

VMware VCNS/NSX Edge Gateway SSLVPN and El Capitan 10.11.1

It is sad that most of my blog posts are related to an SSLVPN client and not my area of expertise, storage.  The previous SSLVPN client post is by far the most popular post on my blog, so here is to another post that will hopefully help someone.  My previous post on the SSLVPN client can be found here.

I had previously been running the NAclient from Edge version 6.1.3 on OS X 10.11 without issue, however the installation will no longer work due to security controls put into place by Apple.  There is a trend here, the previous problem with getting the NAclient installed onto OS X 10.10 (Yosemite) was also due to failure to meet Apple security guidelines, they didn’t implement code signing (amongst using deprecated methods of configuring/starting services).

If you attempt to install the NAclient on a current version of Apple OS X 10.11.1 you will see a lovely error like this one:  “This package is incompatible with this version of OS X and may fail to install.”
Install_VMware_SSL_VPN-Plus_Client_Installer

 

And if you select to Install Anyway it will just fail with the error: “The installation failed.  The installer encountered an error that cause the installation to fail.  Contact the software manufacturer for assistance.”Install_VMware_SSL_VPN-Plus_Client_Installer

It took some digging to figure out what is going on, and I used one of my favorite tools (Pacifist) to open the actual installer package to see where the installation files are written.

Archive_bom

I also remember reading about Apple’s System Integrity Protection locking down access to secure system file locations, including /usr, /System, and others.

In researching this I found Apple’s developer guidelines for System Integrity Protection, and it clearly states that /usr is off limits.

https___developer_apple_com_library_prerelease_ios_documentation_Security_Conceptual_System_Integrity_Protection_Guide_System_Integrity_Protection_Guide_pdf

Source: Apple System Integrity Protection Guide

Now that the root problem is found, I sought a work around.  Previously we had to disable the check for kext signing, though that didn’t permanently solve the issue because the dependent services wouldn’t work after reboot due to use of deprecated methods.  This time I was able to find that you can manage the System Integrity Protection using the command csrutil:

OS_X_10_11_-_Clean_install

In order to manage System Integrity Protection you must reboot your system into Recovery Mode, you can do this by holding down Command+R on system startup.  You should boot to a OS X Utilities menu.  Click Utilities and select to open Terminal.
OS_X_10_11_-_Clean_install_and_untitled_2

Within terminal you simply need to run ‘csrutil disable’ and then reboot the system, the easiest way to do this is with a single command line of ‘csrutil disable; reboot’.
OS_X_10_11_-_Clean_install_and_untitled_2

 

Once the system is booted up you can  now successfully install the NAclient, test that it works.  At this point you can repeat the process and run ‘csrutil enable; reboot’ to turn System Integrity Protection back on…which I strongly encourage.  It is disappointing that we have to disable security controls to install a security tool, however at least we only have to have it disabled upon install.  Any actual package upgrades would require you to repeat this process, but lets face it…VMware hasn’t released an update to this client since the last time I reported a bug in it 😉

Happy VPN’ing.

—– Follow up 11/24/2015 —-

It is becoming obvious that VMware doesn’t take client support for SSLVPN seriously, I will be investigating options to migrate our environments off of SSLVPN within the Edge Gateway.  It is entirely idiotic that you must deploy an entire new version (that isn’t even released yet) in order to fix a simple client installer bug.  That means you have to wait for the new version of the NSX stack to be qualified and supported by GSS with any of your other software components (e.g. vCloud Director), which isn’t an immediate process.  This leaves customers exposed for a long period of time, it increases operational burden for support teams and in general is a horrible way of delivering client software.

Client software should be released and maintained independently of the entire NSX Manager/Edge Gateway bundle.  Requiring every customer to do a massive upgrade of their entire environment just to fix a bug (that they should have known about months ago and should have been included in a 6.1.x release!) in client software is not acceptable.

— Edit 11/30/2015 —

An additional challenge here is if you disable the work around (have System Integrity Protection enabled), as you should, it will result in the client breaking if you run an installer again (say to add another VPN profile).  If you need to add another VPN destination you really should just edit the config file /opt/sslvpn-plus/naclient/naclient.conf directly with the editor of your choice.

— Update 12/16/2015 —

The work around allows the naclient to function on OS X 10.11.2 as well.  If you have the client installed prior to installing 10.11.2 update it will remain working, the work around will also preserve the ability to install after 10.11.2.

Shifting Server Selection Criteria

Note:  I had written this post 2 years ago but somehow never noticed it in my drafts folder…

What was old is new.  Long ago we used internal RAID on servers for most applications, in some cases we would go as far as using internal HBAs with external JBODs to allow 2 physical servers to share some logical volumes, or to get the most out of a “high capacity” (at the time they seemed high, but by today’s standards many phones offer more addressable capacity) RAID enclosures.  Overtime we moved all of this critical data to a shared storage system, perhaps a SAN (storage area network).  The SAN vendors have continued to charge high prices for decreasing value, it left the storage market ripe for disruption with distributed storage that leverages commodity hardware, delivered as software.  No longer will we find it acceptable to pay $2500 for a disk drive in a SAN that we can buy on the street for $250.

This leads me to repeating the past, I find myself in desperate need of brushing up on managing the RAID controllers that are in my hosts.  Perhaps this is for VSAN, or ScaleIO, or some other converged storage offering that can leverage my existing compute nodes and all of what was formerly idle storage potential.  As we make this transition we find that all of our selection criteria we had for our compute hosts are no longer valid, or at least not ideal for this converged deployment.  Up until now the focus has been on compute density, either CPU cores per rack unit or physical RAM per rack unit…in fact many blade vendors found a nice market by maximizing focus on just that.

What these silo compute servers all had in common was minimal internal storage, we didn’t need it.  We needed massive density compute to make room for our really expensive SAN with all of its pretty lights. As we move down this path of converged compute and storage, we need to dig out some of our selection criteria from a decade ago.  We now need to weigh disk slots per rack unit into our figures. It turns out we can decrease our CPU+RAM density by large sums, but through implementing converged storage offerings we can drastically reduce our cost to provide the entire package of compute and storage.  We must look at the balance of compute to storage more closely as these resources are becoming tightly coupled, there are new considerations that we are not accustomed to that if not accounted for can lead to project failure.

When the hypervisor first started gaining ground there was a lot of debate over the consolidation ratio that made sense.  Some vendors/integrators argued that Big Iron made the most sense, a server that has massive CPU and RAM density and allowed for ridiculous VM:host ratios.  What we found is that this becomes a pretty massive failure domain, the larger the failure domain the larger the capacity we have to reserve.  Our cost of the HA (high availability) insurance is directly equal to our host density.  Likewise when we use maintenance mode, the time to enter maintenance mode for each host directly correlates to the utilized RAM density on a host.  The more RAM that is used on a host the longer it will take for every maintenance cycle for that host.

This is relevant as when we look at converged storage (or hyper converged as some may refer to it) we have to consider the same exact thing.  We now have the traditional compute items to account for, but we also need to factor in storage.  Our host is now a failure domain for storage, so we must reserve 1 host (or more) of capacity…this also means that when hosts go into maintenance mode, worst case we have to move an entire host of stored data to insure accessibility.

Storage Benchmarking (part 1 of…)

This will be a series of blog posts that will try to help you establish a consistent storage benchmarking methodology.

Storage is an area that I have focused on for much of my career, I’ve been fortunate to be involved in a lot of very challenging and fun storage projects over the years.  I spent a good period of time in the trenches, either for storage manufacturers or a reseller, performing professional services for custom storage implementations.  It is common place for apprehension on any changes in storage systems, if that is vendor, disk types, RAID type and grouping of disks, and more.  Storage is a complicated (and expensive) beast, it is still one of the most expensive (and profitable) components within the entire datacenter.

Virtualization changed the world for IT, it is one of the most disruptive single concepts that truly changed how IT does business.  I know I don’t have to give history lessons on the impact this has had for the hardware vendors, but it is apparent it was mostly negative for server and network vendors and extremely positive for storage vendors. Prior to virtualization being common place storage area networks (SANs), or other shared storage, was not overly common…when it did exist it was for niche applications that represented a very small part of the services an IT department managed.  With virtualization, and really all of the great things that came with VMware’s virtualization (e.g. high availability, Vmotion, etc), shared storage became a fundamental component in all datacenters.  Shared storage, if it be fiber channel, iSCSI, or NFS, went from representing a fraction of the data storage in a company to addressing essentially all data stored.

Storage has become more critical today than it ever was, we have optimized and streamlined all other aspects of the IT platform…however most storage vendors are still building products that are based on ancient principles, changes are complex and high risk.  It is critical that you have a methodology to actually assess any changes to your storage environment to determine if it is a net gain or loss for your business, and storage performance directly impacts the success of the business.  The challenge is that storage benchmarking is really hard, it is a monumental task to take on and actually do it well.  Various tools have been used to try and do point performance tests, however they are not adequate for assessing replacing an entire storage platform or changing the configuration of an existing one.

Now that I have a long winded introduction lets start looking at what it takes to be successful in storage benchmarking.  Like any initiative it needs to be an actual project with a process, while you can run and download Iometer and run a test within minutes…does that test actually mean anything?  Does it correlate to anything other than another similar run of Iometer?  Likely not.

I happen to focus on storage for a growing public cloud provider, I have spent a significant portion of time over the past 3+ years benchmarking storage platforms.  I have tried various tools to try to assess a rating for storage systems that are under evaluation, and what is important to me may not be important to you.  You need to determine what are the critical aspects for your business, here are the primary areas that I score systems on, in no particular order:

  • Reliability
  • Availability
  • Durability & Security
  • Sustainability
  • Scalability
  • Performance

That is a lot of abilities, so I will break down what I mean in each one into what I am referring to.  You will need to determine what are the specific requirements within each of the scoring areas are for your use case, as nothing is valid if it isn’t within the context of your use case.

Reliability

Reliability is a pretty critical for my use case, and this extends to how predictable are all of the other areas that we are evaluating.  When a component fails, do you have a consistent outcome?  Does performance become unpredictable?  Does the “useable” capacity that is reported change after a failure?  There are a lot of gotchas, and the key is to know and be able to predict them as you must factor them into your implementation plan for the solution.  This is an area that isn’t entirely objective as you just can’t test all conditions and context is everything.

Availability

This is a measure of resiliency, fault tolerance, survivability or in other words how consistent can you access my data..even during failure conditions.  This is where storage vendors fit their dog-and-pony show in the datacenter in, they pull this disk and that cable to “prove” that the data is available even after these failures.  I won’t go into the specific tests that I use for this, you may trust your vendor or you may not…how this is assessed must be within the context of each specific storage system.  How you test/prove availability for legacy architecture, scale-out architecture, or hyper converged architecture type storage systems can vary greatly.

Durability & Security

How confident are you that when written is the data that returned and, perhaps even more critical, that the data returned is the actual data that was written.  Security is grouped with durability as there are two aspects of security, confidence that an unauthorized entity cannot modify or otherwise access my data.  This primarily relates to checksums on written data (to set confident point of reference), scrubbing on stored data (to compare to point of reference), and repair of data (create new copy from parity or mirroring).  This is nearly impossible to actually test for, it is something you must query your vendor about.  There are many ways faults can be introduced that can cause loss of data integrity: bit rot, medium read error, controller/cable failures, and more.  Most vendors address this through checksums, in fact it wasn’t long ago that many vendors used this as the primary differentiator between “enterprise” storage systems or not…and it is often assumed to be present in modern enterprise storage systems, but you may be surprised to find that it doesn’t exist in many popular solutions.

Security is often addressed through encryption, some vendors may claim that their self encrypting disks (SED) are all that is necessary, however if the keys are stored on the disk then the encryption does nothing more than provide rapid-erase (if the disk is still operational), such as before you send it for replacement or otherwise decommission it.

In proprietary systems you really have to trust your vendor, and ideally the vendor has 3rd party validation of their offering so that you aren’t just taking the word of an individual that may be more interested in closing the deal than you keeping your job or your company surviving the “what if”, when it does happen.

Sustainability

Another area that may be more subjective than not, as it is difficult to assess this without a great amount of actual experience using the particular system…so if you are looking at a new product you have to be subjective based on your exposure while evaluating, or try to find IDC, or other, rankings comparing the offering.

Ultimately, is it operational within your environmental boundaries, both physical, staffing (expertise), etc.  Does it integrate seamlessly with any existing processes or tools that you depend on (e.g. monitoring and alerting systems)?  Can you or your staff adequately manage the system during a time of crisis (you know, 6am on a holiday when something horrific happens)?  Having logical and intuitive interfaces is a big differentiator here, or do you refer to documentation anytime you try to manage the system?

Scalability

Scalability is absolutely critical for my use case, and it actually is a comprehensive topic that addresses all of the other abilities and performance.  Does reliability of the system decrease, maintain or increase with scale?  Does the risk for data loss increase or decrease with scale?  Can you sustainably operate 100s or 1000s of these systems?  Do you need 100 operators to manage 1000 systems?

Performance

This is the area that I will devote other posts to, as this is where things get more complex.  Vendors typically have comparison between their solution and others that cover the other topics, however reference performance benchmarks are just that, a reference.  Any benchmark is only valid as a comparison to another benchmark executed the same way using the same workload, if that is a synthetic benchmark or not…and if all instances being compared were setup consistently to the respective vendors “best” practices.

To be continued…

Switching LSI SAS 2208 and similar chipsets to JBOD mode

I have been doing testing various hyper converged storage platforms that can coexist with ESX, along with some bare metal software storage platforms.  In all cases I am using embedded RAID controllers in the servers, in some cases I using some add-on cards.  I have two cards in use currently, one is some Intel flashed LSI card and the other are SuperMicro LSI 2208 that is embedded in the FAT Twin.  While in all of these cases you can use single-disk RAID0 logical volumes, doing so adds a lot of extra steps and in many of my systems it offers no gain.

WARNING:  Proceed at your own risk, I recommend verifying that no data will be impacted by this task.  I also encourage you to confirm that the JBOD (aka pass-through mode) configuration is supported with your hardware and your storage platform.

It is possible that you can do some of these steps with getting into the boot BIOS, however in the case of the Intel flashed LSI cards the boot BIOS is really horrible.  I spent an hour trying to navigate the BIOS over remote console via the Intel Remote Management Module…but it was absolutely painful and the only thing that worked was using the wizard, which created undesirable configurations.  I ended up working around this by doing the following steps:

  1. Download a live boot CD Linux image
  2. Connect ISO to server through virtual media insertion of remote console
  3. Boot Linux image
  4. Configure networking on Linux
  5. Download MegaCLI to local workstation, then SCP it to the Linux machine
  6. Install MegaCLI
  7. Run MegaCLI commands

In more detail:

I downloaded MegaCLI and placed it on my Dropbox folder, this made it easy so I could just use wget on the Linux server after it booted.  Once Linux was booted I configured an IP address onto my appropriate network interface using ifconfig statement, added DNS to resolve.conf, and a default gateway.  I then could SSH in where I had copy and paste to just run the same commands quickly across my dozen hosts.  In my case I selected the CentOS 6.5 LiveCD from a close by mirror, but you should be able to use any Linux bootable CD that is of a more recent build.

I will warn that doing these steps with any data in place will absolutely lead to data destruction.  I am not liable for how quickly the -CfgLdDel command obliterates any existing logical volume configuration, proceed at your own risk.

Continue reading “Switching LSI SAS 2208 and similar chipsets to JBOD mode”

VMware vCloud Network & Security Edge – SSLVPN and Mountain Lion Troubles


October 12 2016 Update – Yosemite & El Capitan:

Wow, its been 3 years since posting this thing and it still gets quite a few hits.  The problem did get worse with Yosemite due to required code signing, however VMware corrected the problem with the naclient that was bundled in NSX 6.1.3.  If you have the naclient installed before upgrading to El Capitan it also works, in my limited testing.  I have heard that trying to install it on El Capitan may encounter issues due to a similar version table as noted below, I have not had a chance to test it on clean install and only tested for Yosemite to El Capitan upgrades.


 

 

The addition of client oriented VPN to the vCNS “Edge” (formerly vShield Edge) is a big win, however anyone that attempts to use the product on the current shipping version of Mac OS X will find that it fails to install.  We are using the SSLVPN heavily for a project and encountered this, I decided to dig into the details.

Within the OSX system logs you will find lots of useless errors, ultimately you want to get to the installer errors themselves.  If you open Console.app and look at the /var/log/install.log (or do so from CLI) you will see this error:

installd[4110]: PackageKit: —– Begin install —–
installd[4110]: PackageKit: request=PKInstallRequest <1 packages, destination=/>
installd[4110]: PackageKit: packages=(
“PKJaguarPackage <file://localhost/Volumes/BigFast/Downloads/naclient.pkg>”
)
installd[4110]: PackageKit: Extracting file://localhost/Volumes/BigFast/Downloads/naclient.pkg (destination=/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/Cleanup At Startup/PKInstallSandboxManager/1.sandbox/Root, uid=0)
installd[4110]: PackageKit: prevent user idle system sleep
installd[4110]: PackageKit: suspending backupd
installd[4110]: PackageKit: opt/sslvpn-plus/naclient/naclient.app relocated to Applications/naclient.app
installd[4110]: PackageKit: Executing script “./preinstall” in /Volumes/BigFast/Downloads/naclient.pkg/Contents/Resources
install_monitor[4115]: Temporarily excluding: /Applications, /Library, /System, /bin, /private, /sbin, /usr
install_monitor[4115]: Re-included: /Applications, /Library, /System, /bin, /private, /sbin, /usr
installd[4110]: PackageKit: releasing backupd
installd[4110]: PackageKit: allow user idle system sleep
installd[4110]: PackageKit: Install Failed: Error Domain=PKInstallErrorDomain Code=112 “An error occurred while running scripts from the package “naclient.pkg”.” UserInfo=0x7fc30b425a80 {NSFilePath=./preinstall, NSURL=file://localhost/Volumes/BigFast/Downloads/naclient.pkg, PKInstallPackageIdentifier=com.vmware.sslvpn, NSLocalizedDescription=An error occurred while running scripts from the package “naclient.pkg”.} {
NSFilePath = “./preinstall”;
NSLocalizedDescription = “An error occurred while running scripts from the package \U201cnaclient.pkg\U201d.”;
NSURL = “file://localhost/Volumes/BigFast/Downloads/naclient.pkg”;
PKInstallPackageIdentifier = “com.vmware.sslvpn”;
}
Installer[4097]: install:didFailWithError:Error Domain=PKInstallErrorDomain Code=112 “An error occurred while running scripts from the package “naclient.pkg”.” UserInfo=0x7f9c8536ce10 {NSFilePath=./preinstall, NSURL=file://localhost/Volumes/BigFast/Downloads/naclient.pkg, PKInstallPackageIdentifier=com.vmware.sslvpn, NSLocalizedDescription=An error occurred while running scripts from the package “naclient.pkg”.}
Installer[4097]: Install failed: The Installer encountered an error that caused the installation to fail. Contact the software manufacturer for assistance.
Installer[4097]: IFDInstallController 83028370 state = 7
Installer[4097]: Displaying ‘Install Failed’ UI.
Installer[4097]: ‘Install Failed’ UI displayed message:’The Installer encountered an error that caused the installation to fail. Contact the software manufacturer for assistance.’.
installd[4110]: installd: Exiting.

This error is really not useful, but by looking within the installer package itself I could see that it is using /tmp/naclient_install.log for the install scripts themselves.  Within this log there is a bit more clue as to why it failed:

/tmp/naclient.pkg/Contents/Resources/preinstall: kernel version mismatch
/Volumes/BigFast/Downloads/naclient.pkg/Contents/Resources/preinstall: kernel version mismatch
/Volumes/BigFast/Downloads/naclient.pkg/Contents/Resources/preinstall: kernel version mismatch

In order to fix this you need to define the Mountain Lion kernel as being valid.  To do this, instead of installing the SSL VPN client from the web interface select to download the zipped file.

Extract the contents of the file and you will have a “naclient.pkg” file.  Like many “files” on OSX, this is actually just a special directory…you can either access the contents via CLI or right-click (or Ctrl-Click) and select to “Show Package Contents”.

If we look at the installation scripts themselves (with the arrows above) we find that the scripts are running a uname command to determine OS version:  uname -r | cut -d. -f1

We can also see that they were nice enough to support all the way back to Panther (released in 2003) but that there is no definition for Mountain Lion.

If we execute this command on Mountain Lion the response is “12”, however “12” is not defined as a valid kernel version.  The reality is that Mountain Lion is close enough for most apps to be considered “Lion”, so we will add this definition just the same as for Lion itself.

We will edit the 4 files that are indicated with the arrows, these are shell scripts and you can edit them with your text editor of choice, all 4 files need to be edited exactly the same just adding a definition for Mountain Lion.

Save your changes to all four files including “postinstall”, “postupgrade”, “preinstall”, and “preupgrade”.

Browse up the directory structure until you see the naclient.pkg and run the installer again.

***** Yosemite Update ******

Any of you that have upgraded to Yosemite may find that you cannot connect to the VPN afterword, it fails to establish a connection with an error somewhat like this:

SSL_VPN-Plus_Client_-_Login

In order to fix this here are the steps I took (PROCEED AT YOUR OWN RISK):

  1. Unistall NAclient:  sudo /opt/sslvpn-plus/naclient/uninstall.sh
  2. Enabled developer mode for Kext insertion:  sudo nvram boot-args=”kext-dev-mode=1″
  3. Rebooted
  4. Installed the NAclient again

I owe thanks to @jakerobinson for this as he actually found the solution.

***** Yosemite Update 01-07-2014 ******

Unfortunately it is not possible to get the naclient to run in any reliable fashion on Yosemite.  I have spent a lot of time on this and ended up using a Mavericks VM in Fusion to get the client to work for the day job.

naclient is dependent up on some kexts to load at system boot, however the method invoked to start these has been deprecated for multiple major releases of OS X and were removed in Yosemite.  The problem extends beyond the lack of signing, it is another example of VMware failing to support OS X even as the company issues Apple systems to a large number of employees and all new systems come with Yosemite pre-installed.

I will try to find time to write up my work around, it uses a VM but allows me to use that VM as a very heavy VPN client but I am able to use my (limited) apps in Yosemite as I normally would.

***** El Capitan 10.11.1+ Update 11-19-2015 ******

Rather than keep adding content to this post, I created  new blog post with the work around for OS X El Capitan and it can be found here.

vCloud Director – Using Guest Customization Scripts (Linux)

The intent of this article is to cover the steps for leveraging scripting within guest customization. A vCloud user may wish to peruse this as an avenue of automatically installing additional software that is hostname specific, e.g. security management software that integrates a Linux OS to Active Directory.

I am going to assume the reader knows how to login to vCloud Director, either within an organization or within the system context. I also assume that an existing virtual machine exists that we will work with, in my example I will use Linux (CentOS).

  1. Stop the vApp if it is currently running (we cannot edit the properties of a running VM)
  2. Open the vApp so that we can see the individual virtual machines

    wpid-voila_capture569-2012-03-15-18-24.png

  3. Right click the virtual machine (or use the action menu) to access the Properties
  4. Switch to the Guest OS Customization tab
  5. Select the option to “Enable guest customization”

    wpid-voila_capture582-2012-03-15-18-24.png

  6. This enables basic guest customization, such as configuring the guest OS hostname, setting the root password and network configuration.
  7. Scroll down within the guest customization tab
  8. You will see a text box, we can input script content within this text box. Alternatively you can upload the script that will be injected into the guest OS during the customization process. I will first start with a simple script that calls an existing shell script within the guest OS. Please also notice that we have specific sections for “precustomization” and “postcustomization”, pre-customization is before the standard vCloud Director customization process and the other is post this process. If the script that you wish to use is dependent upon the hostname or network connectivity, then you would be best served by using a post-customization script. 

In my example I am calling out to two scripts myscript-pre.sh and myscript-post.sh — these scripts must be in place within the OS file system before it can be ran

    .wpid-voila_capture581-2012-03-15-18-24.png

    NOTE: If you wish to upload a script using the Browse button it must be a text only script, it cannot be an executable binary.

  9. Click OK to save those changes
  10. Power on the virtual machine as usual
  11. Create your script within the guest OS in the path you specified
  12. My test script is quite lame, so don’t laugh. The goals are to answer questions that I’ve seen, such as if the network is available and which user context the script runs under.
    • Pre-customization:

      wpid-voila_capture587-2012-03-15-18-24.png

    • Post-customization:

      wpid-voila_capture588-2012-03-15-18-24.png

  13. Shutdown your virtual machine

    wpid-voila_capture577-2012-03-15-18-24.png

  14. Right click and select to Power On and Force Recustomization
  15. After customization completes, login and verify that your script ran.
    • Pre-customization:
      wpid-voila_capture589-2012-03-15-18-24.png
    • Post-customization:

      wpid-voila_capture585-2012-03-15-18-24.png

Observations:

There seems to be little documentation from VMware on “when” exactly a pre-customization script is ran vs a post-customization script. The time is only 23 seconds apart, so what exactly occurs during those 23 seconds? Logging services (syslogd) and most other system services do not start until after the pre-customization script has ran, so little output exists for what occurs during that window (or prior). It appears that pre-customization occurs at the time that vmware-tools start, on my system that is S03…which is the 2nd service to start (after microcode_ctl). You can also compare your time stamps to /var/log/messages in order to see what events are occurring.

In looking at the /var/log/vmware-imc/customization.log we can see a bit more detail as to timing.

wpid-voila_capture586-2012-03-15-18-24.png

Pre-customization occurs before the default vCloud Director customization scripts set execute, which set hostname and network configs (and generate SID or join an AD domain on Windows).

Post-customization is likely the area that most scripts will need to be executed, after the network configuration is set. In testing I encountered a situation that a script that was dependent on additional network services (e.g. to support NFS) would fail if executed directly as a post-customization script, a work around that resolved this was just adding a “sleep 30” prior to the script execution.

An area of challenge is troubleshooting these scripts as there is no way to run customization in an interactive form. The easiest way to confirm things are going to work is making sure the script can run as root if you execute it directly from a login shell. Next you can insert it into the post-customization process and assume that it will work. VMware has published a couple of KB articles that discuss which log files are relevant to the process, you can review those logs for any errors. Ideally your script itself will have error logging capability.

If you wish for advanced customization capabilities, then your best bet is probably to not use the vCloud Director customization at all…or at least only use it to configure the networking. vCenter Orchestrator is far more feature rich and extensible, the limitations on what can be done in vCenter are most likely only constrained by the amount of effort you put into developing your workflows. The customization process used within vCloud Director is more similar to that of Lab Manager than of vCenter, so if you run into trouble you may try searching under Lab Manager discussion groups.

References:

Simplified remote access to a home lab

One of the challenges of being someone that travels on a regular basis is that you are often not near your lab. The investment in a home lab really requires the ability access it from anywhere in order to meet any hope of a falsely perceived ROI. I’ve had a Unix/Linux based workstation for more of my working life than I’ve had a Windows one, sure Windows was always involved as a virtual machine on VMware Workstation (Linux) and now VMware Fusion (Mac).

There are insecure, complex and/or expensive options, such as buying a Cisco ASA or some other “firewall” that supports VPN…but that doesn’t support the goals and requirements for my lab and is the expensive option. The possibly more complex option would be to build a firewall from an PC, but that is high maintenance and I prefer my regular access to be simple and reliable (thus I have a Mac + Airport household, other than the 3 lab servers). The insecure option would be to expose RDP on your Windows guest directly to the Internet, that is not an option for me. My service provider background makes me paranoid about Windows security, or lack there of.

I have chosen to go with the cheapest and simplest option, in my mind. Linux virtual machines are light weight, use few resources, and you could always use a non-persistent disk to make it revert to a known config with a simple reboot (or restore from a snapshot). I leverage SSH tunneling, which is often overlooked and people peruse more complex L2TP or IPSEC based options…but SSH is just simple, seldom blocked on networks and does the job. I have not gone as far as using L3 tunneling, though that is an option with SSH.

Firewall Settings

In my network I have 1 open port on my “firewall” (Apple Airport Extreme) which is forwarded to a minimal Linux virtual machine with a static (private) IP address.

  • Public Internet –> Port 8080 on firewall –> Port 22 on Linux

I would recommend creating multiple port forwards on your firewall, this will give you other options if the one you choose was blocked. I’ve had good luck with 8080 and 8022 so far, but some environments may block those…there is nothing to say you can’t use port 80, however any forced proxy server will break your SSH session access…or protocol inspecting firewalls, and some service providers block the ports 25, 80, 443 and others.

The beauty is that from the Linux side very little needs to be done, I would recommend editing your SSH config on the Linux VM to prevent root access. Keep in mind you really must create your non-root users before you do so, otherwise you cannot login via SSH and will have to add those accounts via console.

Secure Linux SSH Settings

I would recommend making sure your Linux VM is up to date using the correct update process for whichever distribution you select. The SSH server is pretty sure anymore, but when compromises are found you should update to apply the relevant patches.

I would recommend editing the config file for sshd (/etc/ssh/sshd_config). Find the line that states PermitRootLogin and edit it to be “no”, if it is commented out remove the “#” and set it to “no”.

  • PermitRootLogin no

Now restart SSH: $: sudo /etc/init.d/sshd restart

The reason to remove root access to SSH is that its a “known” account and can easily be targeted. You should generally use hard to guess usernames and complex passwords for this “access server”, it is going to be port scanned and have attempts made to compromise it. You ideally would configure the authentication policies so that account lock-out occurs after too many false attempts. Personally I do not allow interactive password based logins, I use only pre shared keys (much more difficult to guess a 2048 bit RSA key than a 8 character password). You can investigate the RSAAuthentication and PubkeyAuthentication options within the sshd_config file to learn more about that option.

Public Access

My cable modem provider issues me a DHCP address, it happens to have been the same address for many months but there is always the chance it could change. I use Dyn (http://dyn.com) to provide dynamic DNS to my home lab. You can install one of their dynamic DNS clients (http://dyn.com/support/clients/) on any OS within your home network that is generally always on (e.g on your Linux access server), some “routers” (e.g. Cisco/Linksys) have one built in.

Client Connection

Setup SSH Saved Configs
At this point you just need to configure your client. I happen to use the default SSH client on Mac OS, though if you are using Windows you could use PuTTY or another client and achieve the same. In my case I don’t want to manually type out all of my config settings every time I connect, remember this is more than for SSH CLI access…it is for our simple “VPN”.

In my environment I either want SSH access or RDP (e.g. to Windows for vSphere Client) access. I do this through simple port forwarding rules.

In order to configure saved “session” settings for the shell SSH client on OS X you will need to do the following:

  1. Open a terminal window of your choice (Terminal.app or my preferred iTerm2)
  2. Navigate to your home directory: $: cd ~/
  3. Create a .ssh directory: $:~ mkdir .ssh
  4. Create a .ssh/config file: $: touch ~/.ssh/config
  5. Set security settings on the .ssh directory, otherwise sshd will not accept your keys if you use them in the future: $: chmod 700 ~/.ssh
  6. Set security settings on config (not really necessary, but anything in .ssh should be set this way): $: chmod 600 ~/.ssh/*
  7. Now we can move on to building our configuration

You can use the editor of your choice to open the config file, if you wish to use an app you can go to finder and press CMD-Shift-G and you will be given a box to type in your target folder (e.g. ~/.ssh/ ), you can then edit the file with whichever editor you prefer (e.g. TextMate). The format of the file is:

Host <name used as ssh target>
        HostName <target hostname>
        User <username>
        Port <TCP port on firewall>
        Compression yes
        AddressFamily inet
        CompressionLevel 9
        KeepAlive yes
        # RDP to Server1
        LocalForward localhost:3389 <private IP>:3389
        # RDP to Server2
        LocalForward localhost:3399 <private IP>:3389
        # RDP to Server3
        LocalForward localhost:3390 <private IP>:3389

Working example:
Host remotelab
        HostName my-dns.dnsalias.net
        User user0315
        Port 8080
        Compression yes
        AddressFamily inet
        CompressionLevel 9
        # Privoxy
        LocalForward localhost:8118 localhost:8118
        # RDP to Control Center Server
        LocalForward localhost:3389 192.168.100.15:3389
        # RDP to vCenter
        LocalForward localhost:3399 192.168.100.20:3389
        # RDP to AD Server
        LocalForward localhost:3390 192.168.100.60:3389
        # HTTPS to vCloud Director cell
        LocalForward localhost:443 192.168.100.25:443

In my case I also installed and configured Privoxy (http://www.privoxy.org/ ) to give me the ability to tunnel other protocols via proxy settings on my laptop (e.g. web browser, instant messengers, etc).

Connect To Your Lab

What was the point of all of this if I don’t show you how to connect? Open your terminal again and type “ssh” followed by your saved config name (e.g. $: ssh remotelab). Authenticate as needed, you should then be connected to the shell of your Linux VM.

Now open your RDP client of choice (I suggest CoRD: http://cord.sourceforge.net/ ), select to connect to one of your target tunnels specifying localhost:<target port for desired server>.

wpid-voila_capture503-2011-09-2-18-182.png

Now anyone lazy, errr…striving for efficiency, will save a config for their servers within CoRD for connecting directly when on your network or via the tunnel. You can then just select the saved session within CoRD.app without having to remember which TCP port is for each server.

Of course, for those Windows users this doesn’t help. In Windows you have a really neat client you can use to simplify this, I would recommend Tunnelier from bitvise: http://www.bitvise.com/tunnelier There may be simpler GUI driven SSH clients for configuring this for Mac OS, however I just use what is included as its always there and it doesn’t break when you upgrade to the next version.

Have a better way that is easy? Let me know, I’m always open to new ideas around getting access to the lab. I’ve always intended to setup View with a secure server, but that is also on the complex path and I want something that just works. Once this configuration is setup you can duplicate it easily, as the complexity is in the saved .ssh/config file and not the “server”.

Managing Windows 2008 Server Core

In the interest of reducing overhead within my lab environment I decided to try and use Windows 2008 Server Core (R2/x64). If you’ve ever installed Server Core, the first thing you notice is you are only presented with a CMD shell at login, there is no full GUI. You can launch applications, installers, etc. however there is no Start Menu to assist you on your way.

I decided I’d chase this rabbit for a bit, as I am researching the use of Server Core in conjunction with vCloud Directory deployed vApp services…as a bloated UI heavy OS isn’t the most practical when it comes to “scale of cloud”, sure we have magical Transparent Page Sharing (for more info on TPS see: PDF written by Carl Warldspurger)…but just like energy efficiency, the easiest WATT to save is the one you never used. I know many of my fellow vGeeks have home labs, and very few of us have the host resources we’d like to have so we are all chasing efficiency, especially in regard to storage and host memory. In order to adapt to using Server Core you need to figure out how to manage it, so I thought I would write an article about what I have learned.

Sconfig.cmd

The first option Microsoft offers is the shell tool, Sconfig, “Server Configuration”, you can access this by running Sconfig.cmd from the CMD prompt.

wpid-voila_capture448-2011-06-3-12-55.png

We can navigate this tool easily by inputing the number of the section we wish to view or modify. If we select #4 (Configure Remote Management) we are given the following subset of options:

wpid-voila_capture449-2011-06-3-12-55.png

If you recall, by default almost all of the remote management tools are disabled on Windows. Likewise, the firewall is enabled and fairly restrictive. Being able to turn on these remote management options quickly so that we can move away from the console is always a benefit.

I went ahead and selected #1 Allow MMC Remote Management, remote MMC is pretty useful as it allows me to consolidate the management tasks between multiple target servers into one place. The window immediately indicated that it was configuring the firewall and enabling required services. It then gave a popup indicating the final status.

wpid-voila_capture450-2011-06-3-12-55.png

I would personally prefer to not have the popup window that I then having to use a mouse to navigate to and select OK, my preference would have been for that status update and acknowledgement to be provided within the textual interface; I suppose I shouldn’t be surprised that a company that is bad at UI is even worse at shell. I will give them some credit, as Sconfig seems to have more intelligence than many of their GUI based wizards do, which often leave some tasks incomplete that must be manually finished in other wizards. Since this is my template I personally went ahead and enabled all of the remote management options to avoid having to do so later.

Core Configurator

The next tool which I found is Core Configurator, which can be downloaded from CodePlex. This tool is delivered as an ISO, one option within vCloud Director would be to upload this ISO and attach it to your Windows 2008 Server Core virtual machine as needed, this may provide some additional security however it isn’t necessarily convenient. Personally, I opted to copy the contents of the ISO to a directory within my Windows 2008 Server Core template so that it is readily available.

Microsoft states that Core Configurator supports the following tasks:

  • Product Licensing
  • Networking Features
  • DCPromo Tool
  • ISCSI Settings
  • Server Roles and Features
  • User and Group Permissions
  • Share Creation and Deletion
  • Dynamic Firewall settings
  • Display | Screensaver Settings
  • Add & Remove Drivers
  • Proxy settings
  • Windows Updates (Including WSUS)
  • Multipath I/O
  • Hyper-V including virtual machine thumbnails
  • JoinDomain and Computer rename
  • Add/remove programs
  • Services
  • WinRM
  • Complete logging of all commands executed

In order to launch Core Configurator, you simply navigate to the directory that contains it and run Start_Coreconfig.wsf (default for attached ISO this would likely be D:\Start_Coreconfig.wsf), which presents this interface:

wpid-voila_capture452-2011-06-3-12-55.png

Selecting the small expansion arrow at the bottom reveals a few more convenient options:

wpid-voila_capture453-2011-06-3-12-55.png

Control Panel view:

wpid-voila_capture454-2011-06-3-12-55.png

I was able to use this tool to install all of the latest Microsoft hotfix packages, the interface could use a “select all” option…but then again, you generally want to review what you are installing and this encourages you to do so. You must select the hotfix to be installed one at a time.

wpid-voila_capture455-2011-06-3-12-55.png

Firewall Settings

You can easily view and modify the Firewall Settings:

wpid-voila_capture456-2011-06-3-12-55.png

wpid-voila_capture457-2011-06-3-12-55.png

wpid-voila_capture458-2011-06-3-12-55.png

Conclusion

Without the GUI we are accustomed to, even the most basic tasks become challenging. Perhaps you know how to configure interfaces, join an Active Directory domain, or even change the computer name from command line on Windows; this task was previously foreign to me. Just how much do we save by using Core instead of a full version of Windows 2008 Server? Here is a screen shot taken from vCenter on resource utilization for this particular Windows 2008 Server Core (R2 x64):

wpid-voila_capture460-2011-06-3-12-55.png

Here is the same resource accounting for a similar (base) config for Windows 2008 R2 Standard:

wpid-voila_capture461-2011-06-3-12-55.png

Notice the Active Guest Memory of each of the above? With only default installation + VMware Tools installed thats a 47% decrease in active memory, it is also a ~decrease of ~20% decrease in storage capacity to support the base OS. While this isn’t much in a large production environment, however I don’t have the luxury of a Cisco UCS B230 with 32-DIMM slots for my lab…when my host only has 16GB of RAM, that increases the number of base OS I can support…again, the easiest unit of X to conserve is the one you don’t use.

Home Lab – Storage Performance Test (Part 1)

This is a continuation of my Home Lab Build – Overview

In order to help out my fellow vGeeks I thought I should keep with my “comparison” to the hardware storage appliances. While I personally won’t be running my system as a 4-disk configuration, I realize that some of them may. I ran some tests using Bonnie++ benchmarking and dd from /dev/zero to provide some benchmarks, I realize that these will not be 100% representative of the performance that would be experienced with protocols in place however it should provide a relative comparison between the disk configuration options.

I have chosen to use Bonnie++ for my benchmarks as it is a far faster setup, it operates from within the management console of Nexenta. If you are not familiar with Bonnie++ and how it performs testing you can find more info here: http://www.coker.com.au/bonnie++/readme.html

I will run three tests using Bonnie++, only varying the block size between 4KB, 8KB, and 32KB.

  • run benchmark bonnie-benchmark -p2 -b 4k
  • run benchmark bonnie-benchmark -p2 -b 8k
  • run benchmark bonnie-benchmark -p2 -b 32k

Each test will be performed against each of the following storage pool configurations:

  • RAID0 (no disk failure protection)
  • RAIDZ1 (single disk failure protection, similar to RAID5)
  • RAIDZ2 (double disk failure protection, similar to RAID6) *this configuration will only be tested with 4K block sizes to display the parity tax*

I will run a few “hardware” variations, my target configuration with 2 vCPU and 4-6GB RAM as well as a reduced configuration with 2vCPU and 2GB of RAM. I expect the decrease in RAM to mostly decrease read performance as it will reduce the working cache size.

I intended to have the time to create some lovely graphs to simplify the process of comparing the results of each test, however I could either wait another week or two before finding time or I should share the results in the output format from Bonnie++. In order to get this info to my fellow vGeeks, I have decided to publish the less-than-pretty format, after all, any real geek prefers unformatted text to PowerPoint and glossy sales docs.

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAID0

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
157MB/s 42% 84MB/s 32% 211MB/s 28% 1417/sec
156MB/s 41% 83MB/s 32% 208MB/s 28% 1579/sec
——— —- ——— —- ——— —- ———
314MB/s 41% 168MB/s 32% 420MB/s 28% 1498/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
148MB/s 22% 92MB/s 20% 212MB/s 20% 685/sec
147MB/s 21% 90MB/s 20% 212MB/s 21% 690/sec
——— —- ——— —- ——— —- ———
295MB/s 21% 182MB/s 20% 424MB/s 20% 688/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
144MB/s 12% 90MB/s 11% 210MB/s 14% 297/sec
153MB/s 12% 92MB/s 12% 210MB/s 15% 295/sec
——— —- ——— —- ——— —- ———
298MB/s 12% 183MB/s 11% 420MB/s 14% 296/sec

Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAID0

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
113MB/s 21% 75MB/s 22% 216MB/s 31% 980/sec
113MB/s 21% 74MB/s 22% 217MB/s 31% 936/sec
——— —- ——— —- ——— —- ———
227MB/s 21% 150MB/s 22% 434MB/s 31% 958/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
110MB/s 13% 80MB/s 15% 209MB/s 22% 521/sec
110MB/s 13% 80MB/s 15% 210MB/s 23% 524/sec
——— —- ——— —- ——— —- ———
220MB/s 13% 161MB/s 15% 420MB/s 22% 523/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
114MB/s 8% 81MB/s 9% 218MB/s 13% 297/sec
113MB/s 8% 79MB/s 9% 218MB/s 12% 294/sec
——— —- ——— —- ——— —- ———
228MB/s 8% 161MB/s 9% 436MB/s 12% 296/sec

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAID1+0 (2 x 1+1 mirrors)

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
89MB/s 27% 53MB/s 19% 143MB/s 22% 1657/sec
89MB/s 27% 53MB/s 19% 144MB/s 22% 1423/sec
——— —- ——— —- ——— —- ———
178MB/s 27% 106MB/s 19% 288MB/s 22% 1540/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
83MB/s 13% 53MB/s 12% 147MB/s 16% 800/sec
83MB/s 12% 54MB/s 12% 147MB/s 16% 752/sec
——— —- ——— —- ——— —- ———
167MB/s 12% 107MB/s 12% 294MB/s 16% 776/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
85MB/s 7% 55MB/s 7% 141MB/s 9% 277/sec
82MB/s 7% 53MB/s 7% 135MB/s 9% 266/sec
——— —- ——— —- ——— —- ———
167MB/s 7% 109MB/s 7% 276MB/s 9% 271/sec

Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAID1+0 (2 x 1+1 mirrors)

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
65MB/s 11% 48MB/s 14% 154MB/s 22% 892/sec
65MB/s 11% 48MB/s 13% 152MB/s 22% 786/sec
——— —- ——— —- ——— —- ———
130MB/s 11% 97MB/s 13% 306MB/s 22% 839/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
67MB/s 7% 47MB/s 9% 157MB/s 14% 669/sec
67MB/s 7% 47MB/s 9% 155MB/s 14% 637/sec
——— —- ——— —- ——— —- ———
135MB/s 7% 94MB/s 9% 313MB/s 14% 653/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
68MB/s 5% 31MB/s 3% 153MB/s 8% 338/sec
68MB/s 5% 31MB/s 3% 151MB/s 8% 342/sec
——— —- ——— —- ——— —- ———
136MB/s 5% 62MB/s 3% 304MB/s 8% 340/sec

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAIDZ1 (RAID5)

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
109MB/s 30% 54MB/s 22% 133MB/s 21% 813/sec
108MB/s 32% 54MB/s 22% 131MB/s 20% 708/sec
——— —- ——— —- ——— —- ———
218MB/s 31% 108MB/s 22% 265MB/s 20% 761/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
114MB/s 25% 60MB/s 17% 131MB/s 14% 525/sec
118MB/s 24% 60MB/s 18% 133MB/s 14% 517/sec
——— —- ——— —- ——— —- ———
232MB/s 24% 121MB/s 17% 265MB/s 14% 521/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
107MB/s 12% 60MB/s 8% 138MB/s 9% 163/sec
111MB/s 11% 60MB/s 8% 138MB/s 9% 172/sec
——— —- ——— —- ——— —- ———
218MB/s 11% 121MB/s 8% 276MB/s 9% 167/sec

Hardware Variation 2 (2 vCPU/2GB RAM) / 4-disk RAIDZ1 (RAID5)

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
74MB/s 15% 40MB/s 12% 160MB/s 18% 715/sec
76MB/s 15% 41MB/s 13% 165MB/s 19% 651/sec
——— —- ——— —- ——— —- ———
151MB/s 15% 82MB/s 12% 325MB/s 18% 683/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
75MB/s 9% 42MB/s 8% 167MB/s 21% 384/sec
73MB/s 8% 42MB/s 8% 166MB/s 20% 387/sec
——— —- ——— —- ——— —- ———
149MB/s 8% 85MB/s 8% 333MB/s 20% 386/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
73MB/s 5% 41MB/s 4% 168MB/s 11% 182/sec
71MB/s 5% 40MB/s 4% 168MB/s 11% 183/sec
——— —- ——— —- ——— —- ———
144MB/s 5% 82MB/s 4% 337MB/s 11% 182/sec

Hardware Variation 3 (2vCPU/8GB RAM) / 4-disk RAIDZ1 (RAID5)

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
114MB/s 34% 58MB/s 22% 146MB/s 22% 872/sec
114MB/s 34% 59MB/s 23% 147MB/s 21% 693/sec
——— —- ——— —- ——— —- ———
228MB/s 34% 118MB/s 22% 293MB/s 21% 783/sec

Hardware Variation 1 (2 vCPU/6GB RAM) / 4-disk RAIDZ2 (RAID6)

================== 4k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
71MB/s 20% 43MB/s 16% 111MB/s 20% 716/sec
71MB/s 20% 43MB/s 16% 110MB/s 20% 677/sec
——— —- ——— —- ——— —- ———
143MB/s 20% 86MB/s 16% 221MB/s 20% 696/sec

================== 8k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
75MB/s 13% 42MB/s 10% 110MB/s 12% 540/sec
74MB/s 16% 42MB/s 11% 104MB/s 11% 491/sec
——— —- ——— —- ——— —- ———
149MB/s 14% 84MB/s 10% 215MB/s 11% 515/sec

================== 32k Blocks ==================
WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS
70MB/s 7% 43MB/s 6% 109MB/s 8% 202/sec
70MB/s 7% 42MB/s 6% 109MB/s 8% 203/sec
——— —- ——— —- ——— —- ———
140MB/s 7% 85MB/s 6% 218MB/s 8% 203/sec

Summary
I have to admit, I was incorrect in my prediction that the RAM size would more directly correlate to read performance…it actually seems that increasing the RAM somehow leads to a slight decrease in read performance, while improving write performance. I am going to speculate this has to do with poor caching algorithms, or at least poor for this workload, as well as ZIL being performed in RAM. The larger RAM leads to increased L2ARC (cache) side, this does improve random-seeks significantly but decreases max read throughout (large block) due to the L2ARC leading to inaccurate predictive reads (speculation).

Much like a NetApp storage systems, writes are always attempted to be done in large chunks…if you actually were to watch the iostat output for the physical devices you would see that it is very much peaks and valleys for writes to the physical media, even though the incoming workload is steady state. NetApp and ZFS both attempt to play a form of tetris in order to make the most efficient write possible, the more RAM available the better it can stage these writes to complete efficiently.

One key measure is the actual throughput per given disk, RAID0 is a good way to determine this. If we look at the results we have the following metrics, as expected re-writes always suffer as they do in any file system. We will focus on writes, reads and random-seeks and I will use the numbers from the lowest memory configuration for RAID0:

  • Writes: 227MB/s
  • Reads: 434MB/s
  • Random Seeks: 958/sec

Now we need to break this into per-disk statistics, which is simply dividing the above value by the number of physical disk.

  • Writes: 56.75MB/sec/disk
  • Reads: 108.5MB/sec/disk
  • Random Seeks: 239.5 IOPS/disk

Of course, we can see that the one flaw in Bonnie++ is that we do not have latency statistics. We normally expect a 7200 RPM SATA disk to offer 40-60 IOPS with sub-20 millisecond response, I have no measure of the response time being experienced during this test or how much of the random seeks were against cache. I selected the lowest RAM (cache) configuration to try and minimize that in our equation.

We can then use this as a baseline to measure the degradation in each protection scheme on a per-data disk basis. In a RAID1+0 configuration we have 2 disks supporting writes, and 4-disks supporting reads and this leads to our reasonable performance for reads. The reason my lab is operating in a RAID1+0 configuration is that my environment is heavily read oriented, and with the low number of physical disks I did not want the parity write-tax in addition with 6 1TB SATA drives I am not capacity restricted.

I almost went into a full interpretation of my results, however I stumbled upon this site in my research: http://blogs.sun.com/roch/entry/when_to_and_not_to You will find a detailed description into the performance expectations of each RAID configuration, the telling portion is this:

Blocks Available
Random FS Blocks / sec
RAID-Z
(N – 1) * dev
1 * dev
Mirror        
(N / 2) * dev
N * dev
Stripe
N * dev
N * dev

The key item to interpret is that with RAID-Z, the random IOPS are limited to a single device. You will see in the referenced blog posting that a configuration of multiple small RAID-Z groups performs better than a large RAID-Z group, as each group would have 1-device supporting the random workload. This may not be 100% in correlation with RAID5, or whatever RAID scheme your storage platform uses as they are not all created equal.