Information Technology

Internship: Data engineering: schema evolution handling on incoming data flow - Ieper, Belgium

Internship: Data engineering: schema evolution handling on incoming data flow

Your future job

There's a constant increase (in volumes and ingestion rates) on incoming dataflows from various sources towards our data analytics and reporting platforms. Over time, the fields and/or their definitions within those flows can change.
Next to the development of schema evolution detection and handling scripts, these setups need to be embedded within the data lake environment in order to be used in an operational environment.
The goal of the project is to identify schema evolution on incoming data flows, to 'productise' this mechanism and to incorporate it on the data lake intake side.
The project includes the following steps:
- Schema evolution detection and handling on diff types of data flows
- Productisation of the solution (setting up data pipelines, versioning, pipelines for deploying different versions,...)
- Implement the mechanism in the data lake setup

Your profile

  • Student in Bachelor or Master in IT 
  • IT software analysis, design and development practices 

  • Minimal: Java development (preferably with the above mentioned libs & platforms)

  • Preferably: Git/Gitlab, Docker, Kubernetes 

  • Data engineering technologies (for the related projects)

Main technologies used:

  • Metadata solutions - catalogs, ...
  • Databricks / spark
  • Python, SQL
  • Continuous delivery (Git/gitlab CI, Docker, Kubernetes,….)
  • Data lake concepts knowledge

We offer

  • a challenging job in a dynamic high-tech international environment
  • the opportunity to take ownership of your professional passion in order to contribute to the success of the company
  • an enjoyable, team-oriented and professional atmosphere in a flat-structured organization
  • versatile development opportunities