Shostack + Friends Blog Archive

 

Emergent Downtime

We had some downtime after a failure at our hosting facility.

We would like to address the power loss which occurred in our Virginia
Datacenter on Wednesday, June 13th. We are still investigating the
root cause, but in the interest of full disclosure, here are the facts
as we know them today. A more complete post-mortem will be sent to you
as soon as possible.

Mmm, full disclosure and analysis. What a neat idea.

4 comments on "Emergent Downtime"

  • Dave says:

    That sounds like server beach…

  • Adam says:

    Yup!
    – A massive wind and hail storm struck the mid-Atlantic region of the
    United States including Northern Virginia yesterday afternoon.
    – Our internal monitoring system alerted us that the local power grid
    dropped at approximately 4:01 pm EDT.
    – We have three generators on site:
    – – The first generator failed to start up correctly. We are not sure
    why and our engineers are investigating further.
    – – The second generator started as expected.
    – – The third generator started initially but soon failed for an
    unknown reason. We are investigating root cause here as well.
    – Because we had only partial generator power, not all the Power
    Distribution Units (PDUs) were receiving power.
    – Power was restored to approximately 50% of our servers in a matter
    of minutes. The remaining ~50% required further attention from our
    engineers.
    – All power was restored at approximately 6:15pm EDT although there
    were isolated power fluctuations over the next two hours.
    All available technicians were on-site throughout this incident and
    have remained in the datacenter over the last 24 hours to ensure all
    servers came back online.

  • Chris says:

    This business about generators not starting seems surprisingly common. I wonder why? It’s not like firing up an internal combustion engine is new tech. There must be something more to it that I just don’t get.

  • yoshi says:

    Lack of testing. When was the last time they tested all three generators in a simulated power out situation? Its all too rare. At my last position – we insisted on quarterly tests.
    And honestly, getting a generator to run is not the problem area. Getting it to run and provide power to where its needed is the problem area. If you don’t do proper testing during installation and follow it up with drills – these things never run the way you expect to in a real emergancy (also – make sure the fuel pump is not run by electrical power off the grid 🙂 )

Comments are closed.