We are looking for Site Reliability Engineer who will be a part of our team and who is currently located in Europe.
Application Requirement:
As part of your application, we would like you to respond to a scenario that simulates a real-world situation you might encounter in this role. Please provide your approach to the following scenario:"Imagine you're responsible for a critical application that experiences intermittent performance degradation. Users are reporting occasional slowdowns, but the system appears fine when you check it. How do you approach troubleshooting this issue, considering that it might not be immediately obvious where the problem lies?"
Requirements:
- 3+ years of commercial experience as a Software Engineer or DevOps Engineer and Site Reliability Engineer
- Hands-on experience with at least one of the leading cloud platforms (AWS, Azure, Google Cloud)
- Technology skills in a broad selection of Terraform, ELK, Grafana, Kubernetes, Docker, Istio, Helm, Git, Bash, CI/CD
- Experience in cloud-native development and microservice architectures
- Strong problem-solving skills and the ability to work well under pressure.
- Fluent knowledge of English
- Strong knowledge of PostgreSQL for data store interrogation
- Automation-first mindset with a focus on process and system efficiency
- Previous exposure to technologies like Bitbucket, Grafana, Jira, Octopus, TeamCity, or similar
- Commitment to meeting timelines and resolving issues promptly
- Strong teamwork ethic, avoiding the "blame game."
Responsibilities:
- Assume responsibility for the stability and issue during core business hours.
- Perform routine maintenance on file, data store, and job control systems.
- Participate in on-call rotations to address any emerging issues promptly.
- Identify and manage risks, promptly flag major issues, and escalate incidents when required.
- Collaborate with software development teams to design and build reliable, scalable, and efficient systems.
- Implement and maintain monitoring and alerting systems to proactively identify and address issues.
- Troubleshoot and resolve complex technical issues related to infrastructure and applications.
- Design, implement, and manage automated deployment and configuration management processes.
- Perform capacity planning and ensure the scalability of our systems.
- Lead incident response and post-incident analysis to prevent future occurrences.
- Suggest innovative solutions leveraging technology to enhance differentiation, efficiency, and user experiences.
We offer:
- Full-time, remote job;
- 4 days working week (Monday - Thursday, Friday day off);
- Paid vacation (20 days) and sick leaves;
- Flexible working schedule;
- Friendly professional staff and warm atmosphere;
About Hellotickets
Hellotickets is the largest marketplace for tours and activities in Spanish, French and Italian.Hellotickets is designed for international travelers who struggle to find a way to buy tickets to events. Platform is already present in 15 countries — all with their local currencies and payment method.
Company website:https://www.hellotickets.com/