https://sre.google/ [[TableOfContents]] = 참여자 = * [김경민] * [김동우] * [김제신] * [박민서] * [음호준] * [전영은] * [조영호] = 진행 방식 = 사전에 주어지는 주제에 대한 분량을 책에서 읽고, 매주 금요일에 인상 깊게 보았던 부분을 서로 공유한다. = 주제 내용 = * Chapter 1 - Introduction * Chapter 2 - The Production Environment at Google, from the Viewpoint of an SRE * Chapter 3 - Embracing Risk * Chapter 4 - Service Level Objectives * Chapter 5 - Eliminating Toil * Chapter 6 - Monitoring Distributed Systems * Chapter 7 - The Evolution of Automation at Google * Chapter 8 - Release Engineering * Chapter 9 - Simplicity * Chapter 10 - Practical Alerting * Chapter 11 - Being On-Call * Chapter 12 - Effective Troubleshooting * Chapter 13 - Emergency Response * Chapter 14 - Managing Incidents * Chapter 15 - Postmortem Culture: Learning from Failure * Chapter 16 - Tracking Outages * Chapter 17 - Testing for Reliability * Chapter 18 - Software Engineering in SRE * Chapter 19 - Load Balancing at the Frontend * Chapter 20 - Load Balancing in the Datacenter * Chapter 21 - Handling Overload * Chapter 22 - Addressing Cascading Failures * Chapter 23 - Managing Critical State: Distributed Consensus for Reliability * Chapter 24 - Distributed Periodic Scheduling with Cron * Chapter 25 - Data Processing Pipelines * Chapter 26 - Data Integrity: What You Read Is What You Wrote * Chapter 27 - Reliable Product Launches at Scale * Chapter 28 - Accelerating SREs to On-Call and Beyond * Chapter 29 - Dealing with Interrupts * Chapter 30 - Embedding an SRE to Recover from Operational Overload * Chapter 31 - Communication and Collaboration in SRE * Chapter 32 - The Evolving SRE Engagement Model * Chapter 33 - Lessons Learned from Other Industries * Chapter 34 - Conclusion = 참고 자료 = * [https://sre.google/sre-book/table-of-contents/|SRE Book]