Site Reliability Engineer
Application Developer/Site Reliability Engineer with Bachelor’s Degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.
Job Duties and Responsibilities:
- Responsible for reliability and availability of all Production environments, their health, on-going monitoring, proactive and preventive health assessments.
- Transform Operations & influence Engineering practices to achieve the strategic goal of new code deployed in Production frequently via Continuous Delivery (CD) pipelines.
- Encompasses handling complex and varied product platforms and multiple Cloud deployment platforms.
Key Responsibilities include, but are not limited to:
- Design the SRE function with the goal of providing 24x7x365 coverage.
- Build and evolve an Operations Model that can handle complexities. spanning various cloud-based deployment models, and technology partner integrations.
- Create & support a delivery ecosystem that thrives on demonstrating value to stakeholders by adopting highly iterative & Continuous delivery models.
- Work with the product management team to define Service Level Agreements (SLAs) Service Level Objectives (SLOs) and implement Service Level Indicators (SLIs) for core capabilities.
- Collaborate with product and engineering to drive and improve the whole lifecycle of operational readiness - from inception to design, through deployment, operations, and proactive refinement.
- Influence Architectural and Product decisions with a bias towards Scale, Observability, Monitoring & Stability and Security.
- Drive incident management process and support a blameless post-mortem culture.
- Own and drive high profile customer escalations.
- Drive and implement lean-ops culture by applying self-service, self-healing, and automation.
- Advocate for SRE Principles, collaborate with all Engineering teams to create a DevOps mindset.
- Responsible for Capacity forecast, Budget & Cost optimization.
- Define and deliver KPIs, Metrics for Operations & Quality to stakeholders -- Deployment Frequency, MTTR, Lead Time, etc.
- Adopt and evolve internal processes based on industry best practices in SRE.
- Grow team members through career development through coaching and mentoring for junior engineers, foster leadership principles and behaviors to groom the next generation of leaders.
- Minimum 3 years of Software Engineering and/or Infrastructure Operations, 2+ years in SRE role.
- Ability to work with distributed, multicultural, and diverse teams.
- Experience with customer escalations and/or operations war room.
- Strong understanding of modern monitoring and logging technologies.
- Strong analytical skills with a data-driven approach to solving problems.
- The ability to partner and influence product, engineering, and operations teams is a must.
- Strong organizational planning and development, business judgment, influential skills, and technical leadership.
- Experience with Agile methodologies -- SCRUM, KANBAN, etc.
Work location is Portland, ME with required travel to client locations throughout USA.
Rite Pros is an equal opportunity employer (EOE).
Please Mail Resumes to:
Rite Pros, Inc.
565 Congress St, Suite # 305
Portland, ME 04101.