Shifting Server Selection Criteria

Note:  I had written this post 2 years ago but somehow never noticed it in my drafts folder…

What was old is new.  Long ago we used internal RAID on servers for most applications, in some cases we would go as far as using internal HBAs with external JBODs to allow 2 physical servers to share some logical volumes, or to get the most out of a “high capacity” (at the time they seemed high, but by today’s standards many phones offer more addressable capacity) RAID enclosures.  Overtime we moved all of this critical data to a shared storage system, perhaps a SAN (storage area network).  The SAN vendors have continued to charge high prices for decreasing value, it left the storage market ripe for disruption with distributed storage that leverages commodity hardware, delivered as software.  No longer will we find it acceptable to pay $2500 for a disk drive in a SAN that we can buy on the street for $250.

This leads me to repeating the past, I find myself in desperate need of brushing up on managing the RAID controllers that are in my hosts.  Perhaps this is for VSAN, or ScaleIO, or some other converged storage offering that can leverage my existing compute nodes and all of what was formerly idle storage potential.  As we make this transition we find that all of our selection criteria we had for our compute hosts are no longer valid, or at least not ideal for this converged deployment.  Up until now the focus has been on compute density, either CPU cores per rack unit or physical RAM per rack unit…in fact many blade vendors found a nice market by maximizing focus on just that.

What these silo compute servers all had in common was minimal internal storage, we didn’t need it.  We needed massive density compute to make room for our really expensive SAN with all of its pretty lights. As we move down this path of converged compute and storage, we need to dig out some of our selection criteria from a decade ago.  We now need to weigh disk slots per rack unit into our figures. It turns out we can decrease our CPU+RAM density by large sums, but through implementing converged storage offerings we can drastically reduce our cost to provide the entire package of compute and storage.  We must look at the balance of compute to storage more closely as these resources are becoming tightly coupled, there are new considerations that we are not accustomed to that if not accounted for can lead to project failure.

When the hypervisor first started gaining ground there was a lot of debate over the consolidation ratio that made sense.  Some vendors/integrators argued that Big Iron made the most sense, a server that has massive CPU and RAM density and allowed for ridiculous VM:host ratios.  What we found is that this becomes a pretty massive failure domain, the larger the failure domain the larger the capacity we have to reserve.  Our cost of the HA (high availability) insurance is directly equal to our host density.  Likewise when we use maintenance mode, the time to enter maintenance mode for each host directly correlates to the utilized RAM density on a host.  The more RAM that is used on a host the longer it will take for every maintenance cycle for that host.

This is relevant as when we look at converged storage (or hyper converged as some may refer to it) we have to consider the same exact thing.  We now have the traditional compute items to account for, but we also need to factor in storage.  Our host is now a failure domain for storage, so we must reserve 1 host (or more) of capacity…this also means that when hosts go into maintenance mode, worst case we have to move an entire host of stored data to insure accessibility.

Advertisement

Switching LSI SAS 2208 and similar chipsets to JBOD mode

I have been doing testing various hyper converged storage platforms that can coexist with ESX, along with some bare metal software storage platforms.  In all cases I am using embedded RAID controllers in the servers, in some cases I using some add-on cards.  I have two cards in use currently, one is some Intel flashed LSI card and the other are SuperMicro LSI 2208 that is embedded in the FAT Twin.  While in all of these cases you can use single-disk RAID0 logical volumes, doing so adds a lot of extra steps and in many of my systems it offers no gain.

WARNING:  Proceed at your own risk, I recommend verifying that no data will be impacted by this task.  I also encourage you to confirm that the JBOD (aka pass-through mode) configuration is supported with your hardware and your storage platform.

It is possible that you can do some of these steps with getting into the boot BIOS, however in the case of the Intel flashed LSI cards the boot BIOS is really horrible.  I spent an hour trying to navigate the BIOS over remote console via the Intel Remote Management Module…but it was absolutely painful and the only thing that worked was using the wizard, which created undesirable configurations.  I ended up working around this by doing the following steps:

  1. Download a live boot CD Linux image
  2. Connect ISO to server through virtual media insertion of remote console
  3. Boot Linux image
  4. Configure networking on Linux
  5. Download MegaCLI to local workstation, then SCP it to the Linux machine
  6. Install MegaCLI
  7. Run MegaCLI commands

In more detail:

I downloaded MegaCLI and placed it on my Dropbox folder, this made it easy so I could just use wget on the Linux server after it booted.  Once Linux was booted I configured an IP address onto my appropriate network interface using ifconfig statement, added DNS to resolve.conf, and a default gateway.  I then could SSH in where I had copy and paste to just run the same commands quickly across my dozen hosts.  In my case I selected the CentOS 6.5 LiveCD from a close by mirror, but you should be able to use any Linux bootable CD that is of a more recent build.

I will warn that doing these steps with any data in place will absolutely lead to data destruction.  I am not liable for how quickly the -CfgLdDel command obliterates any existing logical volume configuration, proceed at your own risk.

Continue reading “Switching LSI SAS 2208 and similar chipsets to JBOD mode”