Distributed Systems Engineer

13 августа 2019    36
Откликнуться

Intro

We are looking for talented profiles to help build and maintain the distributed data collection system that is at the heart of our business.

We are a data-driven company which collects and processes more than 500GB of raw data daily. We leverage big data technologies such as Serverless, Spark on AWS EMR to crunch these volumes of data and make it queryable.

In this role, you will ensure that our data collection engine, which consists of distributed web crawlers, is state of the art and ahead of our competition. You will ensure that we can scrape any webshop, no matter the ban-detection that has been put in place. Then, it will be important that proper monitoring tools are in place. We are currently scraping 60 sites and your goal is to at least triple that without losing completeness and quality.

Your responsibilities will include:

  • Creating and implementing Distributed web crawling architectures
  • Implementing cost-effective data processing architectures
  • Creating advanced system monitoring solutions & dashboards
  • Designing advanced ways of interpreting scraped HTMLs
  • Managing advanced proxies

Main requirements

  • At least 5 years of experience in object-oriented software engineering & design in any object-oriented programming language
  • Experience with and understanding of large-scale web crawling
  • Experience with databases, SQL
  • Experience with infrastructure such as load-balancers, caches
  • Highly proficient in spoken and written English
  • You never stop learning

Nice to have

  • Have experience building on top of Amazon Web Services
  • Have programming experience with Python
  • Expert knowledge of web-scraping & web-scraping architectures
  • Experience with GoLang & JavaScript (Node.js) is a plus
  • Experience with big data technologies (such as Hadoop, Spark, Airflow, Cassandra, Elasticsearch) is a plus
  • Have a deep understanding of cloud possibilities and limitations in the areas of distributed systems, load balancing and networking, massive data storage, and security
  • Get energy from working in a highly complex and challenging startup environment with a high tech product
  • Knowledge of DevOps & automation (Terraform, Ansible)
  • Data analysis using Pandas (Python)


Perks

  • Work with the latest tech stack
  • We're a quickly growing company => you're personal growth can be huge too.
  • Health Benefits – Comprehensive coverage for medical needs
  • Meal allowance – Monthly meal card along with a fully stocked kitchen with enough coffee and fruits, along with monthly team drinks and dinners
  • Work-Life Balance – We trust you to know your schedule and work when you feel most productive
  • Learning and Development – Attend meet-up, conferences, and events that interest you and benefit your personal and career growth.


Подписывайтесь на наш телеграм-канал @remotelist, чтобы всегда быть в курсе новых вакансий! Дайджесты с новыми вакансиями появляются каждые 2-3 часа.

Еженедельная рассылка топ-15 самых просматриваемых вакансий сайта. Письмо приходит каждое воскресенье.