Data Engineer

Arrixa AB · 2 months ago
Location
Sweden - Stockholm
Department
Consulting - ARR
Employment Type
Full-time

We are seeking an experienced Data Engineer with a deep understanding of data engineering, cloud data platforms, and the Databricks ecosystem. The ideal candidate will have over 6 years of hands-on experience designing and implementing data solutions on Databricks, along with strong expertise in data modeling, ETL processes, and cloud data architectures. You will be responsible for building, optimizing, and maintaining robust data pipelines while ensuring data quality and scalability.

Key Responsibilities:

  1. Design and implement scalable data pipelines on Databricks and Apache Spark.
  2. Develop, test, and maintain ETL solutions using PySpark and SQL.
  3. Build data models and optimize performance for efficient data processing and analytics.
  4. Collaborate with data scientists, analysts, and other stakeholders to understand requirements and deliver data solutions.
  5. Implement best practices for data governance, data security, and data quality management.
  6. Monitor and troubleshoot performance bottlenecks in data pipelines and optimize them for efficiency.
  7. Lead efforts in the migration of data solutions to Databricks from legacy systems.
  8. Work closely with cloud architects to ensure seamless integration with cloud data platforms (e.g., Azure, AWS, GCP).
  9. Participate in code reviews and provide mentorship to junior engineers.

Required Skills & Qualifications:

  1. 6+ years of experience in data engineering, with at least 3+ years of hands-on experience on Databricks.
  2. Expertise in Apache Spark and PySpark.
  3. Strong knowledge of SQL and relational databases.
  4. Experience with cloud data platforms like Azure Data Lake, AWS S3, or GCP BigQuery.
  5. Proficiency with ETL tools and processes.
  6. Understanding of data modeling, schema design, and optimization techniques.
  7. Familiarity with CI/CD pipelines and version control (Git).
  8. Knowledge of Delta Lake, Parquet, and other big data file formats.
  9. Experience with Airflow, Databricks workflows, or other orchestration tools.
  10. Solid understanding of data security and governance in cloud environments.

Nice-to-Have Skills:

  1. Experience in machine learning frameworks and pipelines on Databricks.
  2. Familiarity with Power BI, Tableau, or other data visualization tools.
  3. Knowledge of Kafka, Event Hubs, or other real-time data streaming technologies.
  4. Experience with Scala in a Spark environment.

Education:

  1. Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or a related field.