We help families that work, work. We are the family benefits platform that picks up where the healthcare systems leaves off.
Our next generation platform is using an exciting mixture of technologies: large-scale, highly available, public-facing applications built in React Native. Our backend is composed of many different microservices built using NestJS and Python/Django, backed by a combination of technologies to fit our scaling needs like PostgreSQL, Redis, and more, allowing us to process tons of events using an event-based architecture. All of this is hosted on AWS utilizing tools like Aptible to manage our infrastructure.
At Cleo, we are a big believer in teams owning their features and we have instilled a motto of “build, deploy, run”. This means teams don’t just develop features, but they are also responsible for deploying and supporting their features in Production. Our teams have a continuous feedback loop by using tools like New Relic, SumoLogic, and Sentry to monitor application health and performance.
We are focused on continuing the growth of our next-generation platform and are looking for a new Senior Site Reliability Engineer to join our team. You will use your strong understanding of modern open-source technologies and have the opportunity to learn more. Site Reliability Engineers are a hybrid of systems and software engineers who are responsible for maintaining the health of our applications and infrastructure through scaling, automation, and feedback loops. Site Reliability Engineers have an intense passion for finding and improving efficiencies with infrastructure, development and deployment automation. As a Senior Site Reliability Engineer for Cleo, you will have an opportunity to enable our engineering teams to build highly available, scalable and secure applications in the cloud to solve real-world challenges.
This role reports to the Senior Engineering Manager. You must be authorized to work in the U.S. without sponsorship.
- Champion DevOps best practices and standards across all of R&D and Engineering
- Maintain and evolve our application monitoring practices
- Support and evolve Cleo’s CI/CD’s release pipelines
- Maintain and evolve Cleo’s cloud computing infrastructure, focusing on cloud security, scaling, cost optimization, and automation
- Automate and support evolving Dev Ops responsibilities and tasks to reduce toil for supporting Cleo’s platform
- Work with Cleo’s Security team to ensure application and infrastructure adhere to our policies and controls
- Evolve our technology roadmap, recommending and adapting to our evolving architecture
- Performance tuning applications, databases, and configuration settings
- Controlling our application logging infrastructure
- Work to ensure system and data security is maintained at a high standard, ensuring the confidentiality, integrity and availability of the Cleo’s application is not compromised
- Ensure industry best practice coding standards are adhered to in particular ensure all code developed at Cleo is free from known security vulnerabilities, such as those defined and published by OWASP
To be successful in this role you may have:
- 4+ years in DevOps, System, or Software Engineering
- Strong experience running production applications workloads in AWS Cloud
- Understanding of public Cloud networks, VPC peering, etc
- 3+ years utilizing Cloud computing (EC2, SNS/SQS, RDS)
- Containers and orchestration (Docker, Kubernetes, EKS)
- Experience administrating technologies at scale such as ElasticSearch, Postgres and Redis
- Familiarity with monitoring tools like New Relic, SumoLogic, etc.
- Experiencing managing and automating Continuous Integration and Continuous Delivery (CI/CD) using GitLab, CircleCI, TeamCity, Jenkins or similar
- Provisioning and configuration management (Terraform, Ansible)
- Linux or Window server administration
- Strong desire to influence the direction of Cleo’s DevOps practices
- Ability to mentor junior Engineers
- Familiarity with scripting language such as Python, Groovy, Powershell or Ruby
- Experience integrating security tooling (container scanning, static and dynamic analysis) into the pipeline
- Ability to weave monitoring, logging, and alerting into everything you build
- Quality and security conscious when implementing solutions
- Ability to debug complicated issues in collaboration with peers
- Experience working within and upholding compliance with HIPAA and other standards
To apply for this job please visit jobs.lever.co.