О компании и команде
We are looking for a Python Developer with expertise in handling large datasets and working with modern data platforms. This role will involve data scraping, API integration, and the use of third-party scraping tools to manage records, which are expected to scale into the tens of millions. Efficient data processing, categorization, and optimization are key components of this project.
Key Responsibilities:
- Data Management & Scraping: Handle large datasets by developing efficient data pipelines, including scraping data from various sources (APIs, third-party scraping tools), with a focus on extracting reused items from various marketplaces and consolidating the data for further analysis.
- Airflow Automation: Use Apache Airflow to schedule Python jobs for regular data cleaning, optimization, and scraping tasks. While prior Airflow experience is preferred, strong Python skills are sufficient for quick upskilling.
- Supabase & Data Integration: Integrate data into Supabase for the US platform, mirroring the existing UK platform setup. Experience with PostgreSQL and Supabase will be crucial for this task.
- GCP Experience: Work within a Google Cloud Platform (GCP) environment, managing cloud storage, computing resources, and database solutions.
- Frontend Collaboration (Next.js): Collaborate with the frontend team working with Next.js to ensure backend integration aligns with frontend requirements, particularly in mirroring the UK platform’s structure.
Ожидания от кандидата
Required Skills:
- Python Development: Strong experience in Python, particularly in data processing, API integration, and managing large datasets.
- SQL & PostgreSQL: Advanced proficiency in SQL, including writing complex queries and managing databases optimized for large-scale data.
- Apache Airflow (Preferred): Experience using Airflow for scheduling, automation, and managing Python-based workflows.
- Supabase & PostgreSQL: Familiarity with Supabase as a backend service and working knowledge of PostgreSQL databases.
- Google Cloud Platform (GCP): Experience using GCP services like Cloud Storage, Cloud Functions, or Cloud SQL in data-heavy environments.
- Git & Vercel: Ability to work with version control (Git) and deploy on platforms like Vercel as part of a modern DevOps pipeline.
- Next.js (Bonus): Although not the primary focus, knowledge of Next.js would be an advantage for better backend/frontend integration.
Desirable Traits:
- Knowledge of working with product data and affiliate product data, including geolocation data and techniques for categorizing data.
- Experience in extracting reused items from various marketplaces and consolidating the data.
- Strong ability to communicate clearly and effectively, particularly when collaborating with frontend teams.
- Familiarity with third-party scraping tools for large-scale data extraction.
- Ability to work independently, while actively collaborating with other teams.
- Expertise in data cleaning and optimization techniques to ensure high-performance environments.
Условия работы
This work is for an international company on an English-speaking team. English Upper Intermediate and higher is a must. There is a trial period of 3 months, full-time on an hourly paid basis, freelance contract.