Site Reliability Engineer for Service Insights
About Swiss Re
Swiss Re is one of the world’s leading providers of reinsurance, insurance and other forms of insurance-based risk transfer, working to make the world more resilient. We anticipate and manage a wide variety of risks, from natural catastrophes and climate change to cybercrime.
At Swiss Re we combine experience with creative thinking and cutting-edge expertise to create new opportunities and solutions for our clients. This is possible thanks to the collaboration of more than 13,000 employees across the world.
We offer a flexible working environment where curious and adaptable people thrive. Are you interested in joining us?
About the Role
In the Logging & Monitoring product of the Service Insights Centre, we develop and operate state-of-the-art logging and monitoring platforms to collect application behavior information, detect / limit service disruption and provide the associated reporting capabilities, in order to help application and platform owners identify any growing risks, have a clear understanding of their SLAs, reduce the mean time to resolution and be ahead of the curve with regards to long term trends.
As a Site Reliability Engineer you will be:
- developing and operating state-of-the-art logging, monitoring & event management platforms to help application and platform owners to better understand their workloads running in multi-cloud platforms
- providing consultancy service in logging & monitoring to application, product and service owners as well as developers.
- part of a squad of very dynamic, highly motivated and diverse engineers.
- working closely with developers and application owner to improve our cloud infrastructure and application stability and resilience.
About the Team
The mission of the new Service Insights Centre is hugely ambitions – we want to provide autonomous IT Operations capabilities (e.g. with AI) to recognize and resolve serious issues faster and with greater accuracy than humans can today. We will provide monitoring and predictive insights about how all IT services are meeting current and will meet future service level objectives (stability, availability, security etc.). We will support and enable business and technical domains to deliver the data required for our vision. More precisely, we:
- Develop, maintain and provide a continuous (24/7) overview about the IT Health in Swiss Re (current issues, incidents, deployments, possible future issues, cyber threats etc.).
- Provide insights to quickly determine the cause of and possible resolution path for incidents.
- Predict incidents before they happen, trigger automation to resolve them and strive for autonomous operation of systems.
- Provide insights and transparency to application owners and other stakeholders on stability, impacts, pain points, where to improve etc.
- Contribute insights to all stakeholders in support of making our IT services stable and reliable and increasing transparency about outages and/or incidents.
- Provide critical metrics and insights to the CTO and senior management in order to steer the technology strategy and investments etc.
We are happy to meet you if you possess:
- Software development, continuous integration/deployment and system engineering experience in cloud-native ecosystems.
- Experience with a container orchestration system (e.g. Kubernetes) with solid security and network skills.
- Experience in a modern language e.g. Golang, Java and in scripting languages (PowerShell, Python and advanced bash scripting).
- Strong software engineering background, problem resolution focused, willing to learn, fast learner.
- Passionate learner who stays up-to-date with the latest trends and can vet with pragmatism and long term vision the adoption of new technologies.
- Willing to work with open-source application and infrastructure monitoring tools e.g. Elastic stack (ELK), Influx stack (TICK), Prometheus and Grafana.
- Excellent oral and written English skills, additional language skills are a plus.
Interested in this new challenge?