We are looking for a highly skilled Azure Site Reliability Engineer (SRE) In this role, you will be instrumental in ensuring the reliability, scalability, and performance of our cloud-based services. You will work closely with cross-functional teams to design and implement best practices, optimize infrastructure, and drive operational excellence across our Azure environments.
Key responsibilities
- Automation & CI/CD: Design and maintain automation frameworks for deployment, scaling, and environment management.
- Monitoring & Maintenance: Implement and manage monitoring tools to ensure system health and proactively resolve issues.
- Incident Management: Respond to incidents, perform root cause analysis, and implement preventive measures.
- Performance Optimization: Analyze and enhance system performance for scalability and efficiency.
- Capacity Planning: Forecast system needs and ensure infrastructure readiness for growth.
- Collaboration: Partner with development teams to embed reliability into the CI/CD lifecycle.
- Documentation: Maintain detailed documentation of system architecture, configurations, and procedures.
- Tool Development: Build internal tools to streamline operations and improve reliability.
- Security: Enforce and monitor security controls across all systems.
- SLO Management: Define and track Service Level Objectives (SLOs) to meet business reliability targets.
- On-call Support: Participate in a rotating on-call schedule for 24/7 support of critical systems.
Required qualifications
- Experience: Minimum 5 years in a Site Reliability Engineer or DevOps role, with a strong focus on Microsoft Azure.
- Languages:
- English: C1 level
- Bonus: French or Dutch (B1 level)
Technical skills
Proven experience in:
- Azure Cloud Services (VMS, Storage, Networking, etc.)
- Infrastructure as Code (IaC): TerraForm, ARM templates, or Bicep
- Monitoring & Logging: Azure Monitor, Application Insights, Log Analytics, Grafana, Splunk, Elastic Stack
- Scripting: Python, Azure CLI, PowerShell
- Containerization: Docker, Kubernetes
- Cloud Networking: VPNs, Nsgs, load balancing, hub & spoke models
- Azure Governance & Cost Management