Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

[802SEC] Fwd: grouper.ieee.org outage



FYI
More details on today's Grouper outage.
Jon
-------------------------------------------------------------------------------
Jon Rosdahl                         Standards Architect
hm:801-756-1496                  CSR Technologies Inc.
cell:801-376-6435                 10871 North 5750 West 
office: 801-492-4023             Highland, UT 84003

A Job is only necessary to eat!
A Family is necessary to be happy!!


---------- Forwarded message ----------
From: Luigi Napoli <l.napoli@ieee.org>
Date: Thu, Apr 3, 2014 at 4:00 PM
Subject: grouper.ieee.org outage
To: Paul Nikolich <paul.nikolich@att.net>, Jon Rosdahl <jrosdahl@ieee.org>, Pat Thaler <pthaler@broadcom.com>, David Law <David_Law@ieee.org>, bobgrow@cox.net, bheile@ieee.org
Cc: Robert Labelle <r.labelle@ieee.org>, Karen McCabe <k.mccabe@ieee.org>, Christina Boyce <c.boyce@ieee.org>, Soo Kim <s.h.kim@ieee.org>, Noelle Humenick <n.humenick@ieee.org>, Walter Pienciak <w.pienciak@ieee.org>, Yvette Ho Sang <y.hosang@ieee.org>



grouper.ieee.org (ieee802.org) experienced an outage today, 3 April 2013, that expanded well beyond the scheduled downtime.  The scheduled work involved replicating the virtual machine to the offsite datacenter and bringing it back. IEEE-SA is implementing a plan for disaster recovery to an offsite datacenter (in another region of the country) in the event that the IEEE datacenter is forced to shutdown, such as with Hurricane Sandy.  

This exact process had been tested by IT on a non-critical SA machine with flawless results.  That VM was replicated offsite, brought live at the remote datacenter, tested to ensure all applications were working while it was fully available to users, and then migrated back to NJ.  Total downtime for that exercise was about 10-15 minutes over an hour period.  Confidence was high that the migration process could be reproduced with all other SA VMs.

Unfortunately, today, after replicating the machine offsite, there were some problems with VMware and IT had to contact the vendor for support.  The need to have external resources involved resulted in the increased outage time.  A plan will be established to mitigate this potential risk.


_____________________________________
Luigi Napoli
Sr. Technology Community Specialist
IEEE Standards Association
IEEE. Advancing Technology for Humanity.
_____________________________________

---------- This email is sent from the 802 Executive Committee email reflector. This list is maintained by Listserv.