ANNOUNCE: Unplanning network maintenance/outage

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ANNOUNCE: Unplanning network maintenance/outage

Derek Atkins
Good morning, GnuCashers,

Some (many?) of you may have noticed the outage of 'code.gnucash.org'
starting with a lot of packet loss on Thursday and escalating into a
complete outage by Friday.  This took out our Subversion, Wiki, Email
List, everything server.  Well, as of 2:15pm US/EDT on Saturday
(yesterday) everything should be back to normal and operational.  If you
don't want to hear the gory details of what happened feel free to stop
reading now.

The issue was multiple simultaneous failures of multiple pieces of
equipment.  What I thought was a power outage turned out be caused by a
failure in my main network switch.  It started dropping ports, or
causing ports to fail partially (dropping packets).  This was also the
main cause of the packet loss, too.  However I didn't discover this
until later.

My main DHCP server was off the net; I swapped ethernet cables and it
appeared to fix the problem.

My main database server, however, lost its main network controller so I
had to install a new one (I have a few on hand, so it was a relatively
painless operation -- I just had to remember the magic voodoo to get the
system to call the new card 'eth0', but that was also only a few
minutes).

It was only after I got this working that I realized that it was the
switch that had failed -- many of the ports connected to actual hosts
had a 'dead link'.  I also noticed that my main DHCP server was
bouncing.  It would come on the net, stay for a bit, and then go dark.
Luckily I also had a few extra (smaller) switches lying around so I
linked a few of them together and moved all the non-working ports over.
This also fixed the bouncing DHCP server.

Last, but not least, the VM Server Host's network was wedged, requiring
a complete reboot to reset.  This also required resetting all the VMs,
some of which required a bit of hand-holding to come back (and many of
which required a virtual disk fsck as well, taking even more time).  The
last of the systems returned to service shortly after 2pm.

I do plan to acquire a new switch to replace the failing one, but what I
have now is working so I'll watch it closely for now.

Thanks,

-derek

--
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       [hidden email]                        PGP key available
_______________________________________________
gnucash-announce mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-announce
Reply | Threaded
Open this post in threaded view
|

Re: ANNOUNCE: Unplanning network maintenance/outage

Derek Atkins-3
I have a AC-DC-AC UPS. I an fairly sure it is not a power related problem.  The switch is old and has already burned through one power supply. I think it just got too old and tired.  I think it burned out the network card, too, possibly in its flailing..  I think it all relates to the switch.

-derek

Sent from my HTC smartphone

----- Reply message -----
From: "Ted Creedon" <[hidden email]>
To: "Derek Atkins" <[hidden email]>
Cc: <[hidden email]>, <[hidden email]>, <[hidden email]>
Subject: Unplanning network maintenance/outage
Date: Sun, Mar 17, 2013 11:56 AM


Do you need a UPS?

Sounds like a power related problem

tedc

On Sun, Mar 17, 2013 at 5:18 AM, Derek Atkins <[hidden email]> wrote:
Good morning, GnuCashers,

Some (many?) of you may have noticed the outage of 'code.gnucash.org'
starting with a lot of packet loss on Thursday and escalating into a
complete outage by Friday.  This took out our Subversion, Wiki, Email
List, everything server.  Well, as of 2:15pm US/EDT on Saturday
(yesterday) everything should be back to normal and operational.  If you
don't want to hear the gory details of what happened feel free to stop
reading now.

The issue was multiple simultaneous failures of multiple pieces of
equipment.  What I thought was a power outage turned out be caused by a
failure in my main network switch.  It started dropping ports, or
causing ports to fail partially (dropping packets).  This was also the
main cause of the packet loss, too.  However I didn't discover this
until later.

My main DHCP server was off the net; I swapped ethernet cables and it
appeared to fix the problem.

My main database server, however, lost its main network controller so I
had to install a new one (I have a few on hand, so it was a relatively
painless operation -- I just had to remember the magic voodoo to get the
system to call the new card 'eth0', but that was also only a few
minutes).

It was only after I got this working that I realized that it was the
switch that had failed -- many of the ports connected to actual hosts
had a 'dead link'.  I also noticed that my main DHCP server was
bouncing.  It would come on the net, stay for a bit, and then go dark.
Luckily I also had a few extra (smaller) switches lying around so I
linked a few of them together and moved all the non-working ports over.
This also fixed the bouncing DHCP server.

Last, but not least, the VM Server Host's network was wedged, requiring
a complete reboot to reset.  This also required resetting all the VMs,
some of which required a bit of hand-holding to come back (and many of
which required a virtual disk fsck as well, taking even more time).  The
last of the systems returned to service shortly after 2pm.

I do plan to acquire a new switch to replace the failing one, but what I
have now is working so I'll watch it closely for now.

Thanks,

-derek

--
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       [hidden email]                        PGP key available
_______________________________________________
gnucash-devel mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


_______________________________________________
gnucash-announce mailing list
[hidden email]
https://lists.gnucash.org/mailman/listinfo/gnucash-announce