MindTouch Service Degradation: Sites unavailable
Incident Report for MindTouch
Postmortem

Customer Business Impact: On 20230105 from approximately 17:40 – 17:50 UTC all sites experienced increased latency and 50x errors.  At 17:50 UTC full functionality was restored to all sites.

Details: At 17:42 UTC, DevOps received multiple PagerDuty notifications indicating high latency and error rates for all sites.

Recovery: At 17:48 UTC DevOps identified the root cause (missing container image in our container registry).  A workaround was deployed to use the previous version of the image.  At 17:49 UTC, sites recovered and by 17:50 error rates and latency returned to nominal levels.

Corrective Actions

  1. CXone Expert changed the container image retention to 90 days to prevent early deletion.
Posted Mar 09, 2023 - 00:46 UTC

Resolved
This incident has been resolved.
Posted Jan 05, 2023 - 20:24 UTC
Update
We are continuing to monitor for any further issues.
Posted Jan 05, 2023 - 20:24 UTC
Monitoring
We had a brief site outage. As of right now sites are back up but some may see degraded performance. We will post further updates as the situation progresses.
Posted Jan 05, 2023 - 17:51 UTC
This incident affected: Application (General Service), Search, and In-Product Contextual Help.