Senior Data Engineer
Develop and Maintain ETL Pipelines: Design, develop, and implement scalable ETL workflows using PySpark, Python, and AWS Glue, Databricks.
Data Transformation and Integration: Extract, transform, and load data from various sources to AWS S3 and Redshift.
Performance Optimization: Identify and resolve performance bottlenecks in ETL processes, ensuring optimal performance across large datasets.
Debugging & Reverse Engineering: Should be able to debug PySpark programs/Job and reverse Engineer the code.
Automation and Monitoring: Implement automation scripts using AWS Lambda, Step Functions to schedule and monitor data pipelines.
Data Quality: Ensure data integrity and quality across all stages of the ETL pipeline.
Collaboration: Work closely with data architects, analysts, and stakeholders to understand requirements and provide clear communication throughout the project lifecycle.
Documentation: Create and maintain technical documentation, including data mapping, workflow designs, and ETL processes.
Knowledge of CI/CD pipelines and best practices in deployment automation.