Concept

Site reliability engineering

Site reliability engineering (SRE) is a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. SRE claims to create highly reliable and scalable software systems. Although they are closely related, SRE is slightly different from DevOps. The field of site reliability engineering originated at Google with Ben Treynor Sloss, who founded a site reliability team after joining the company in 2003. In 2016, Google employed more than 1,000 site reliability engineers. After originating at Google in 2003, the concept spread into the broader software development industry, and other companies subsequently began to employ site reliability engineers. The position is more common at larger web companies, as small companies often do not operate at a scale that would require dedicated SREs. Organizations that have adopted the concept include Airbnb, Dropbox, IBM, LinkedIn, Netflix, and Wikimedia. According to a 2021 report by the DevOps Institute, 22% of organizations in a survey of 2,000 respondents had adopted the SRE model. Site reliability engineering, as a job role, may be performed by solo practitioners or organized in teams usually being responsible for a combination of the following within a broader engineering organization: System availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Site reliability engineers often have backgrounds in software engineering, system engineering, or system administration. Focuses of SRE include automation, system design, and improvements to system resilience. Site reliability engineering, as a set of principles and practices, can be performed by anyone. SRE is similar to security engineering in that everyone is expected to contribute to good security practices, but a company may decide to eventually hire staff specialists for the job. Conversely, for securing internet systems, companies may hire security engineers. To define and ensure their reliability goals, companies may hire SREs as well.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.