ANNOUNCE: Unplanned Server Outage overnight last night due to power outage
Good morning everyone,
Atlanta got hit by some major storms last night, which caused a
signifanct power loss starting around 11pm US/EDT. Power stayed down
long enough that my UPSes expired. Power came back and got lost a few
times over the course of the early morning and came back completely
around 1:30. Unfortunately the VM Server hosting code did not.
At around 4:30 I woke up, came down to the server, and determined the
issue and was able to recover the system relatively quickly. I took the
opportunity to update some software and reboot the system cleanly again
to test it all. The final reboot happened around 5:45am and the VMs
came back up shortly thereafter.
All services should now be running normally.
I'm sorry for any inconvenience this outage may have caused.
For those that care about the gritty details, the failure was due to the
way I added additional disk space to the VM server. When I added the
disks I used Linux software raid, however when I created the array I
used the md-raid metadata version 1.2. Unfortunately the running kernel
was unable to build the full logical volume due to this "mistake".
I was able to correct the immediate issue by upgrading the kernel. This
allows the new raid to load later, however it now loads as md127 instead
of md3, but that doesn't affect the significant operation of the server.
Upgrading the kernel included an upgraded vmware-server patch that will
hopefully work better than the previous vmware-server patch I had tried.
If the system remains stable with the new kernel then all will be good.
However if the server reverts back to its disk IO issues that I observed
before then I might have to determine new ways to work around the raid
metadata issue, which may involve upgrading to grub2. Hopefully it wont
come to that. Please keep your fingers crossed.