Site Reliability Engineer
Optomi, in partnership with a leading financial services client, is seeking a Site Reliability Engineer II (SRE II) to join a highly technical, cloud-focused engineering team. This role is designed for a hands-on engineer with a strong background in Azure cloud infrastructure, automation, and reliability engineering — not a production support or operations-only position.
What the right candidate will enjoy:
- Work with a top-tier financial services organization
- High-impact, engineering-first SRE role
- Exposure to modern Azure cloud and container platforms
Requirements and Skills of the right candidate:
- 3–5+ years of direct Site Reliability Engineering experience
- (Hands-on engineering required; production support-only backgrounds will not be considered)
- Strong experience with Azure cloud services and cloud-native architectures
- Proven experience with:
- Terraform
- Docker
(2–3+ years)
- Azure DevOps
- Development experience supporting or building .NET applications
- Solid understanding of CI/CD, cloud networking, and distributed systems
- Bachelor’s Degree in Computer Science, Engineering, or related field
- Or equivalent professional experience
Key Responsibilities:
- Hands-on Engineering & Reliability
- Design, build, and maintain reliable, scalable cloud infrastructure in Microsoft Azure
- Apply SRE principles to improve system availability, performance, and fault tolerance
- Participate in architecture discussions and contribute to reliability-focused design decisions
- Perform root cause analysis and implement long-term fixes for system issues
- Cloud & Infrastructure Engineering
- Engineer and support Azure services including:
- AKS (Azure Kubernetes Service)
- Azure Service Bus & EventHub
- Azure SQL Server
- Azure Function Apps & App Services
- Manage infrastructure using Terraform following Infrastructure as Code (IaC) best practices
- Containerize applications using Docker and support containerized workloads in AKS
- DevOps & Automation
- Build and maintain CI/CD pipelines using Azure DevOps (ADO)
- Automate deployment, scaling, and operational workflows
- Collaborate with development teams supporting .NET-based applications
- Monitoring & Observability
- Implement and enhance monitoring, logging, and alerting strategies
- Leverage observability tools; Splunk Observability Cloud experience is preferred
- Define and track SLOs, SLIs, and error budgets