Outage of The LexBlog Platform and Premier Managed Platform
Incident Report for LexBlog
Postmortem
Members,

Yesterday evening (Friday, 2/26/16) we experienced a Cloud Block Storage volume failure impacting some sites on The LexBlog Platform and all sites on Premier Managed Platform. Cloud block storage allows for additional storage on the environment as publisher and visitor demands increase.

This initial issue triggered an atypically long event which we want to fully clarify for you.

What Happened

At approximately 3:10 p.m. Pacific time Friday, we received our first alerts from our system monitoring that some blogs were offline. This triggered our incident response team to immediately investigate the cause of the issue and work toward resolution.

Within 25 minutes, our team was able to restore service.

What we did not immediately realize, however, was the initial failure of the Cloud Block Storage volume caused a subsequent issue - a takedown of our database master server.

As many members know, we have multiple server redundancies in place for when there are issues like this – one reason we are able to quickly restore service whenever there is an issue. In this particular instance, these redundancies became out of sync because the database master server was down.

With the redundant application servers out of sync, assorted issues resulted for publishers and readers as we reported in an update, including publishing errors and redirect loops.

Our team then worked to re-sync all application servers. All but one re-synced quickly, and all were completely re-synced by 10:50 p.m. Pacific.

What We’re Doing Going Forward

As a result of this issue, we recognize the need to fully test publishing and run automated checks after an outage of any kind. This is now a standard part of our incident response process.

Additionally, we are working on communication with our cloud server provider to better identify warning signs and fully restore performance of our publishing software for members as quickly as possible in the future.

Our Continued Commitment to You

Without the redundancy of our servers, strategic technology partnerships, and refined incident response process, this issue would have likely resulted in several hours of downtime instead of the initial 25 minutes.

That said, we know this outage and subsequent intermittent issues impacted many members ability to publish timely content. We are very sorry for this impact.

Thank you for your continued membership and understanding as our team worked to resolve this issue.

Sincerely,

The LexBlog Team
Posted Feb 27, 2016 - 11:13 PST

Resolved
The LexBlog Platform and Premier Managed Platform are performing normally since our fix at 1:50 a.m. EST, and this incident is resolved.
Posted Feb 27, 2016 - 09:55 PST
Monitoring
We have implemented a fix for the intermittent errors on The LexBlog Platform and Premier Managed Platform and are monitoring to ensure performance is fully restored. Drafting and publishing posts is working; however, we recommend saving a copy offline until our monitoring is complete.
Posted Feb 26, 2016 - 23:24 PST
Update
The LexBlog Platform and Premier Managed Platform continue to experience issues. Blog users may possibly experience missing posts and/or editing issues and blog visitors and users may experience redirect loop errors or see a domain routing page instead of blog pages.

We are working to resolve this and will continue to post updates here as available.
Posted Feb 26, 2016 - 18:05 PST
Identified
Though sites still appear online, we have identified that the admin area of some sites on The LexBlog Platform and Premier Managed Platform is not yet fully operational. Post edits and publishing may result in errors.

While we work to resolve this issue for site editors, please save posts and edits outside of the platform to ensure no work is lost.
Posted Feb 26, 2016 - 17:04 PST
Monitoring
We have restored The LexBlog Platform and Premier Managed Platform. Sites are back online now.

We will continue to monitor the performance of sites to ensure stable performance into the weekend.
Posted Feb 26, 2016 - 15:42 PST
Identified
We've identified the cause of this outage and are restarting the servers powering The LexBlog Platform and Premier Managed Platform in order to restore service.
Posted Feb 26, 2016 - 15:31 PST
Investigating
We are investigating an issue on The LexBlog Platform and Premier Managed Platform powered by WordPress currently preventing the admin area of sites and visitor-facing pages from loading.

As we learn more and work to restore normal service, we will post updates here.
Posted Feb 26, 2016 - 15:18 PST
This incident affected: The LexBlog Platform (LexBlog Platform 1) and Premier Managed Platform.