Site Reliability Engineering
Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.
This section covers various SRE topics, practices, and personal experiences implementing SRE principles.
Topics
SRE Fundamentals
SRE 101: Complete Guide to Site Reliability Engineering - Comprehensive series covering SRE fundamentals with Go examples
Core SRE Concepts
Monitoring and Observability
Prometheus 101: Complete Guide - Comprehensive guide to monitoring TypeScript applications with Prometheus
OpenTelemetry 101: Complete Guide - Comprehensive guide to unified observability with OpenTelemetry
Reliability Metrics
Last updated