We have started streaming event data and writing data pipelines in Apache Airflow that transform this data with Apache Spark, and now Catawiki needs more Data Engineers to help us tackle the challenge of bringing even more data sources into our platform and transforming and organizing this data for analysis and consumption by other applications. It’s a big challenge but one that we’re really excited about solving.
What you will do
Evolve our current infrastructure to a distributed system and help build scalable data pipelines using the Hadoop ecosystem. You will work in cooperation with our Software Engineers, Devops Engineers, Data Scientists and Product Managers to:
- Explore new ways of transforming and analyzing data and continuously expand and improve the performance of our data pipelines.
- Build prototypes, fast, and determine what their worth are in the business and within the infrastructure before iterating and improving them.
- Work closely with Data Scientists and Product Managers to decide how best to structure and store data in order to make it easily accessible to business users.
- Evaluate and develop highly distributed Big Data solutions; You will advance our software architecture and tool set to meet growing business requirements regarding performance and data-quality.
Who you are
- A Data Engineer who likes to experiment with and explore new tools and technologies. You will be familiar with tools in the Hadoop ecosystem including Spark, Kafka, Hive or similar.
- You are a Software Engineer with experience in modern backend web technologies.
- You know how to design and build low-maintenance, high performing ETL processes and Data pipelines.
- You can communicate an idea clearly on various levels of abstraction, depending on the audience.
- Professional experience with relational databases: reading, writing and optimizing complex statements.
- We have a strong preference for someone experienced with Python rather than Java