Resolved
SEA-SKVM-3 Failure
Node sea-skvm-3 suffered a rather lengthy outage on 1/6 starting around 12:40PM EST. The root cause was related to a RAID card failure that caused several drives to detach from the array at the same time. After some initial troubleshooting, datacenter staff replaced the faulty card; however, the sudden loss of the drives left the system in an unbootable state. After many hours of trying to repair the system, we diverted our efforts to data recovery. We were able to boot the system into a rescue environment and manually copy off most of the instance data, though it was very evident that data loss did occur.
While we're glad that we were able to recover most instances, there were unfortunately several instance disk images that were too corrupted to move or were simply unable to be repaired. We are still doing what we can to try to recover data, but at this point, if you see that your instance is not booting, your disk image may be one of the ones that we could not recover, and it is unlikely that we will be able to repair it. It's possible that your only option, unfortunately, is to reinstall your OS and restore your instance from your most recent backup.
We are very sorry that this happened, and we did our best to minimize impact as much as possible. Unfortunately hardware failures do happen and the most we can do at this point is assist all affected customers with getting back up and running as soon as possible.
Resolved
·
7 Jan at 01:15pm EST