Junior Data Engineer
Are you an experienced Data Engineer looking for an international, creative and innovative environment? Would you like to work on a self-service data platform, making sure our data makes its way from a vast array of sources to the right place?At the IT Department of Randstad Groep Nederland (HQ) we are looking for you! We’re looking for a Junior Data Engineer available to join our internal team immediately.Data Engineering at Randstad Groep Nederland (HQ)As a member of the DataHub Team you are responsible for the development and maintenance of the Randstad data lake and the services offered to data scientists and data ;The DataHub Team is making use of a variety of technologies and we are responsible for our own infrastructure.We provide a platform to distribute data to data scientists and analysts all over the organization to make use of all the data that is generated in the Randstad Group Netherlands.You will be part of an agile team and play a vital role in the design and development of a cloud-based data platform.
- Build and manage the DataHub, which includes:
- A front-end data catalog
- DataHub users and data science projects management
- Data subscriptions
- An AWS s3 based Data Lake
- Develop ways to improve self-service data consumption and data publishing:
- Build and manage ETL pipelines in Airflow, which are responsible of ingesting the data and making the data available to users
- Develop standard ways to deliver data in the DataHub
- Develop CI/CD pipelines for data consuming teams to let them develop their products
You will be responsible for producing quality code and reusable components.Using containerization, CI/CD and other automation technologies, you will be responsible for creating a backend for high availability and scalability, while at the same time being easily deployable, manageable and secure.Together with the rest of the team you will be involved in the full product development process, from design, implementation, to testing, documentation and automated deployment. Respond to and resolve operational incidents, performing root cause analysis and managing changes required to prevent future occurrences.In this team you will have a wide range of responsibilities and should be willing to adapt to many different challenges.Discuss with the users of the platform requirements and future improvements, but also come with proposals for our users on how to use the ; Manage and develop our data persistence environments (data lake, storage, etc) to ensure that data is properly available to users and securedMonitor systems for uptime and ; The data lake we maintain is partly in Redshift, and is moving fully towards S3. The new S3 Data Lake will be accessed through the Trino Query engine that lives on an auto-scaling EKS cluster and eats raw data via Spark through an EMR cluster that makes use of a fleet of Spot instances. We have created a Django based metadata catalog that functions at the same time as a portal to monitor the data and to provide services for our ;For general usage we offer the functionality of data subscriptions through scheduled unloads to a project space on S3. Furthermore we offer tools to work with machine learning models using Sagemaker notebooks. You will be designing and setting up infrastructure on AWS for the expanding services of our platform and develop airflow dags that represent our data pipelines. Most of our coding is done in python.
So what would we like you to know and bring to the table?
- Experience working in our tech stack, including Python, AWS cloud, Airflow, SQL and Docker;
- Experience with scripting/automation of tasks;
- Experience with common devops and CI/CD practices to make your own work as easy as possible and guarantee the quality of our products;
- Being comfortable with working in test-driven development and know what this means;
- Good communication skills;
- A proactive and energetic mindset;
- A natural liking to take the lead;
- A self-starting & curious personality.
If you really want to impress us, you can do so by having experience in: containerization platforms like; Spark, Jenkins, Django, EMR, Kubernetes/EKS, Presto/Trino;
What do we offer?
- Plenty of training and development opportunities within the group. A significant share of our employees have held several roles in their years in the business, with RGN giving you the tools you need to challenge and develop yourself;
- A very competitive package depending on your experience;
- 8,5% holiday allowance;
- A generous monthly benefit budget on top of your salary and holiday pay that you can choose to spend on extra time off, perks such as a bike, tablet, gym subscription or simply get paid out;
- 25 days holiday with the option to buy additional 25 days off via the above budget;
- A generous sabbatical program;
- A good mobility scheme, laptop and everything you need to perform your job well;
- An attractive bonus scheme and the option to earn an outperformance bonus twice per ;
Does this sound like the right next step for you? Fantastic! Apply directly by clicking apply or contact our Senior Staff Specialist for more information | )This is a full time (40h/week) role. Read more about working at RGN IT and our ;