Site Reliability Engineer (SRE) / DevOps
The Site Reliability Engineer provides infrastructure, tooling, and support for the other technical teams, so they can develop, deploy, monitor and fix Deepomatic's platform in production. You will also have a temporary role as an ITOps to help us improving our process to provide the proper software and hardware environment so that our team can be as productive as possible while ensuring the security of the development process. This ITOps role should last a maximum of two years until we open a dedicated full-time position and should take a maximum of 15% of your time.
- Providing and support infrastructure: Cloud resources, Kubernetes clusters, etc. with proper security and isolation.
- Improving monitoring and observability, notably with Datadog, Sentry, but also by helping developers surface useful data in their application logs; so that they can better understand and improve how their code behaves in production.
- Allowing developers to maintain high quality by providing CI/CD and testing tools, notably with our internal tool https://github.com/deepomatic/dmake.
- Your IT role will consist of maintaining network, workstation and server configurations to provide a remote-friendly but secure development environment.
- You will engage with a large span of the Deepomatic teams, ranging from the development and product teams to the "Solution Architects " who are responsible for configuring Deepomatic's platform for the specific needs of our clients.
Within 1 month, you will:
- Discover what's behind the scene of our AI technology and its business applications
- Complete the Deepomatic Academy, an onboarding project which all new employees go through.
- Work on improving the docker base image of one of our repositories
- Get familiar with Kubernetes
Within 3 months, you will:
- Be ramped up on Kubernetes
- Master the product and its various components in term of infrastructure
- Create an environment to stress test our product for the QA
Within 6 months, you will:
- Deploy our product on a multi-cluster/cloud-provider environment using Terraform
- Work on security and especially a focus on Keycloak and Vault
- Investigate the possibility to move to Garden.io instead of our internal DMake tool.
- All levels of experience welcome, you could for example be an experimented ITOps who would like to ramp up on DevOps technologies.
- A taste for operational technical challenges and diversity of technologies
- Experience with Linux system (package management, networking, security...)
- Experience in networks
- Experience with bash and python
- Great human qualities and a love for teamwork
- Great oral and written communication in English.
Junior engineers are welcome but the following additional skills are a plus:
- Experience with Docker and Kubernetes
- Experience with Terraform and Helm
- Experience with Keycloak and Vault
- Experience with Jenkins
- Experience with a least one of the main cloud providers: GCP, AWS, or Azure (we mainly use GCP but intend to be cloud-agnostic)
- Experience as a software developer
- Experience with databases (PostgreSQL)
- Knowledge in security