Master essential strategies to implement automation, monitor reliability, and enhance resilience in modern systems
This course is given as an intensive two day workshop covering the notions of service reliability and system resiliency supporting the needs of service assurance. This workshop is intended for practitioners or managers involved in optimizing platform and service reliability and resilience via a clear understanding of business demand and automation.
Is it for you ?
Personnel involved in contributing and optimizing the reliability, resilience of platforms and services.
Prerequisites
Participant should have three to six months of prior IT experience.
What You'll Walk Away With
- ✓ Define and manage SLOs to align technical performance with user satisfaction
- ✓ Design reliable and scalable systems using SRE principles
- ✓ Implement full-stack observability to detect and anticipate incidents
- ✓ Manage incidents effectively and improve operational response
- ✓ Apply chaos engineering practices to strengthen service resilience
Training content
1 Day 1:
- SRE Principles and Practices
- Service Level Objectives and Error Budgets
- Reducing Toil
- Monitoring and Service Level Indicators
2 Day 2:
- SRE Tools and Automation
- Anti-Fragility and Learning from Failure
- Organisational Impact of SRE
- SRE, Other Frameworks, The Future
- Review and exam preparation.
📌 Practical information
Our training sessions are offered in Montreal or Quebec City, in person or in a virtual classroom. Dates and locations are specified when you select your session below. If you have any questions, check out our FAQ.