Root Cause Analysis: 01/08/2019
Customer Business Impact:
Some MindTouch sites experienced intermittent slowness for nearly 100 minutes.
Problem Summary:
MindTouch sites experienced slow loads intermittently for all users and Anonymous access due to MindTouch servers stress caused by waves of automated traffic.
Recovery:
Scaling MindTouch servers and services to match the incoming traffic prevented site degradation after initial concerns were brought to the Engineering team.
Root Cause Summary:
A MindTouch site started receiving large amounts of requests from multiple IP Addresses which put unexpected stress onto MindTouch servers that results in slow load. This wave subsided and another similar wave came nearly three hours after the first.
Corrective Actions:
MindTouch Engineering was able to make the target site inaccessible for the duration of the attack requests, and then made the site accessible after the incident and symptoms subsided.
Chronology of Events (all times PDT):
01/08/2019 [6:35 AM]
MindTouch Engineering was notified of potential server issues through internal reporting mechanisms and began investigating.
01/08/2019 [7:45 AM]
The MindTouch Server Status page is updated to inform customers of the server’s degraded performance and to inform customers that MindTouch Engineering is already working to identify the issue.
01/08/2019 [8:15 AM]
MindTouch Engineering was able to confirm that the high traffic subsided and that sites would be working as expected.
01/08/2019 [9:22 AM]
MindTouch Engineering noticed another wave of high traffic and began launching new servers to keep pace with the traffic bump.
01/08/2019 [12:41 PM]
MindTouch Engineering confirmed that the high traffic ended and that the server count has scaled back down to nominal levels. The MindTouch Server Status page is updated to reflect this.