Data Engineer
Key Responsibilities
Design, build, and maintain scalable data pipelines for batch and real-time processing using tools like Apache Airflow, dbt, or Apache Spark.
Develop robust ETL/ELT workflows to ingest, clean, transform, and load data from diverse sources (APIs, databases, files, streams).
Work with stakeholders to understand data needs and translate business requirements into technical solutions.
Ensure data is accurate, timely, and accessible to downstream users (BI tools, ML models, applications).
Collaborate with data architects and engineers to build a modern data stack leveraging cloud-native platforms (e.g., AWS, GCP, Azure).
Monitor and optimize data pipeline performance, scalability, and cost-efficiency.
Implement data quality, validation, and observability frameworks to proactively detect and resolve issues.
Maintain clear documentation of pipelines, data flows, and architecture.
Support data compliance, governance, and security policies.
Required Qualifications
Technical Skills
Strong programming skills in Python or Scala for data engineering tasks.
Experience with ETL/ELT tools and orchestration frameworks (e.g., Airflow, dbt, Luigi, Kedro).
Proficiency in SQL for data manipulation and modeling.
Experience with big data and distributed processing technologies (e.g., Spark, Kafka, Flink).
Familiarity with data warehousing solutions (e.g., Snowflake, BigQuery, Redshift).
Hands on experience with cloud platforms and data services (e.g., AWS Glue, GCP Dataflow, Azure Data Factory).
Experience working with version control (Git), CI/CD pipelines, and containerized environments (Docker, Kubernetes).
Soft Skills
Strong problem solving and debugging skills.
Excellent communication and documentation abilities.
Ability to work independently and collaborate across cross-functional teams.
Strong attention to detail and data quality.
Preferred Qualifications
Experience with real time data streaming and event driven architectures.
Familiarity with data cataloging and lineage tools (e.g., Amundsen, DataHub).
Knowledge of MLOps or integration with ML pipelines is a plus.
Experience in industries like ecommerce, finance, or healthcare is a bonus.
Education & Experience
Bachelors or Masters degree in Computer Science, Engineering, or a related field.
2 to 5 years of experience in data engineering or pipeline development roles.