Build SRE skills: SLOs, observability, incident management, and chaos engineering for reliable systems.
This course is delivered as an intensive three-day workshop covering the practical application of SRE principles to large-scale service scalability and reliability, with a focus on modern IT leadership and organizational change approaches. This workshop is aimed at practitioners or managers seeking to improve the reliability and resilience of platforms as well as services.
Is it for you ?
Personnel involved in improving the capacity, reliability and resilience of platforms and services.
Prerequisites
• The participant must hold an SRE Fondation certificate and have three to six months' prior IT experience.
• SRE Foundation Certification
What You'll Walk Away With
- ✓ Define and leverage SLOs to drive reliability and user satisfaction
- ✓ Design secure, reliable, and scalable systems
- ✓ Implement full-stack observability to monitor services effectively
- ✓ Manage incidents and structure efficient operational response
- ✓ Apply chaos engineering to test and strengthen system resilience
Training content
1 Day 1:
- SRE anti-models
- Service level objectives (SLOs), the proxy for customer happiness
- Building secure, reliable systems
- Non-abstract capacity planning for large-scale design
2 Day 2:
- Full Stack observability
- Using platform engineering and AIOps
- SRE management and incident response
- Gremlin instrumentation
3 Day 3: Chaos engineering
- Chaos engineering
- SRE is a form of DevOps
- Review and exam preparation.
📌 Practical information
Our training sessions are offered in Montreal or Quebec City, in person or in a virtual classroom. Dates and locations are specified when you select your session below. If you have any questions, check out our FAQ.