Timeline
- 08:01 – Operations lead notified of degraded performance in certain instances. Investigation started immediately.
- 08:05 – First user reports received describing login issues or degraded performance.
- 08:06 – Alert cleared as performance levels recovered above thresholds. Performance continued to improve steadily.
- 08:15 – Affected users were informed that the issue was resolved.
Root Cause
The issue was caused by a load balancer that failed to scale up an additional instance as expected, leading to temporary performance degradation for a subset of customers.
Resolution
Traffic stabilized as existing resources recovered and performance returned to normal without manual intervention.
Preventive Actions
- Review and adjust load balancer scaling rules to ensure additional instances are started when required.
- Enhance monitoring to detect and address scaling anomalies more proactively.