E D R , A S I H C RSS

Site Reliability Engineering

Difference between r1.1 and the current

@@ -1 +1,54 @@
Describe SiteReliabilityEngineering here
https://sre.google/
 
[[TableOfContents]]
 
= 참여자 =
* [김경민]
* [김동우]
* [김제신]
* [박민서]
* [음호준]
* [전영은]
* [조영호]
 
= 진행 방식 =
사전에 주어지는 주제에 대한 분량을 책에서 읽고, 매주 금요일에 인상 깊게 보았던 부분을 서로 공유한다.
 
= 주제 내용 =
* Chapter 1 - Introduction
* Chapter 2 - The Production Environment at Google, from the Viewpoint of an SRE
* Chapter 3 - Embracing Risk
* Chapter 4 - Service Level Objectives
* Chapter 5 - Eliminating Toil
* Chapter 6 - Monitoring Distributed Systems
* Chapter 7 - The Evolution of Automation at Google
* Chapter 8 - Release Engineering
* Chapter 9 - Simplicity
* Chapter 10 - Practical Alerting
* Chapter 11 - Being On-Call
* Chapter 12 - Effective Troubleshooting
* Chapter 13 - Emergency Response
* Chapter 14 - Managing Incidents
* Chapter 15 - Postmortem Culture: Learning from Failure
* Chapter 16 - Tracking Outages
* Chapter 17 - Testing for Reliability
* Chapter 18 - Software Engineering in SRE
* Chapter 19 - Load Balancing at the Frontend
* Chapter 20 - Load Balancing in the Datacenter
* Chapter 21 - Handling Overload
* Chapter 22 - Addressing Cascading Failures
* Chapter 23 - Managing Critical State: Distributed Consensus for Reliability
* Chapter 24 - Distributed Periodic Scheduling with Cron
* Chapter 25 - Data Processing Pipelines
* Chapter 26 - Data Integrity: What You Read Is What You Wrote
* Chapter 27 - Reliable Product Launches at Scale
* Chapter 28 - Accelerating SREs to On-Call and Beyond
* Chapter 29 - Dealing with Interrupts
* Chapter 30 - Embedding an SRE to Recover from Operational Overload
* Chapter 31 - Communication and Collaboration in SRE
* Chapter 32 - The Evolving SRE Engagement Model
* Chapter 33 - Lessons Learned from Other Industries
* Chapter 34 - Conclusion
 
= 참고 자료 =
* [https://sre.google/sre-book/table-of-contents/|SRE Book]






2. 진행 방식

사전에 주어지는 주제에 대한 분량을 책에서 읽고, 매주 금요일에 인상 깊게 보았던 부분을 서로 공유한다.

3. 주제 내용

  • Chapter 1 - Introduction
  • Chapter 2 - The Production Environment at Google, from the Viewpoint of an SRE
  • Chapter 3 - Embracing Risk
  • Chapter 4 - Service Level Objectives
  • Chapter 5 - Eliminating Toil
  • Chapter 6 - Monitoring Distributed Systems
  • Chapter 7 - The Evolution of Automation at Google
  • Chapter 8 - Release Engineering
  • Chapter 9 - Simplicity
  • Chapter 10 - Practical Alerting
  • Chapter 11 - Being On-Call
  • Chapter 12 - Effective Troubleshooting
  • Chapter 13 - Emergency Response
  • Chapter 14 - Managing Incidents
  • Chapter 15 - Postmortem Culture: Learning from Failure
  • Chapter 16 - Tracking Outages
  • Chapter 17 - Testing for Reliability
  • Chapter 18 - Software Engineering in SRE
  • Chapter 19 - Load Balancing at the Frontend
  • Chapter 20 - Load Balancing in the Datacenter
  • Chapter 21 - Handling Overload
  • Chapter 22 - Addressing Cascading Failures
  • Chapter 23 - Managing Critical State: Distributed Consensus for Reliability
  • Chapter 24 - Distributed Periodic Scheduling with Cron
  • Chapter 25 - Data Processing Pipelines
  • Chapter 26 - Data Integrity: What You Read Is What You Wrote
  • Chapter 27 - Reliable Product Launches at Scale
  • Chapter 28 - Accelerating SREs to On-Call and Beyond
  • Chapter 29 - Dealing with Interrupts
  • Chapter 30 - Embedding an SRE to Recover from Operational Overload
  • Chapter 31 - Communication and Collaboration in SRE
  • Chapter 32 - The Evolving SRE Engagement Model
  • Chapter 33 - Lessons Learned from Other Industries
  • Chapter 34 - Conclusion

4. 참고 자료

Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2023-09-20 22:54:00
Processing time 0.0530 sec