Data Engineer
Data Pipeline Engineer
Location: PCS CHE, Chennai
Years of Experience: 5-7 Years
Job Summary: We are seeking a skilled Data Pipeline Engineer to design, develop, and maintain scalable data pipelines using AWS technologies. The ideal candidate will have hands-on experience with AWS Glue, Glue Catalog, Amazon S3, Athena, and AWS Lambda, and will be proficient in building and optimizing ETL/ELT workflows using PySpark and Python. This role requires an independent contributor who can work under tight timelines and deliver robust solutions.
Responsibilities:
- Design, develop, and maintain scalable data pipelines using AWS Glue, Glue Catalog, Amazon S3, Athena, and AWS Lambda.
- Build and optimize ETL/ELT workflows using PySpark and Python for large-scale data processing.
- Develop and manage real-time data streaming pipelines using Apache Kafka, ensuring low latency and high reliability.
- Create, maintain, and optimize SQL queries for data extraction, transformation, and analysis.
- Implement data ingestion frameworks to handle both batch and streaming data from multiple sources.
- Troubleshoot, debug, and optimize data workflows to meet performance and scalability requirements.
- Ensure data quality, consistency, and integrity across different data platforms and pipelines.
- Collaborate with cross-functional teams (data engineers, analysts, and stakeholders) to understand data requirements and deliver robust solutions.
- Monitor and maintain production pipelines, ensuring high availability and quick issue resolution.
Mandatory Skills:
- Hands-on experience with AWS Glue, Glue Catalog tables, Athena, S3, and Lambda.
- Proficiency in PySpark and Python.
- Experience with Apache Kafka for building real-time data pipelines.
- Strong SQL skills for data extraction and transformation.
- Ability to work independently and deliver solutions under tight timelines.
Preferred Skills:
- Experience with Glue Streaming.
- Familiarity with AWS CDK.
Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field.
Data Engineer
Years of Experience: 5-7 Years
Job Summary: We are seeking a skilled Data Pipeline Engineer to design, develop, and maintain scalable data pipelines using AWS technologies. The ideal candidate will have hands-on experience with AWS Glue, Glue Catalog, Amazon S3, Athena, and AWS Lambda, and will be proficient in building and optimizing ETL/ELT workflows using PySpark and Python. This role requires an independent contributor who can work under tight timelines and deliver robust solutions.
Responsibilities:
- Design, develop, and maintain scalable data pipelines using AWS Glue, Glue Catalog, Amazon S3, Athena, and AWS Lambda.
- Build and optimize ETL/ELT workflows using PySpark and Python for large-scale data processing.
- Develop and manage real-time data streaming pipelines using Apache Kafka, ensuring low latency and high reliability.
- Create, maintain, and optimize SQL queries for data extraction, transformation, and analysis.
- Implement data ingestion frameworks to handle both batch and streaming data from multiple sources.
- Troubleshoot, debug, and optimize data workflows to meet performance and scalability requirements.
- Ensure data quality, consistency, and integrity across different data platforms and pipelines.
- Collaborate with cross-functional teams (data engineers, analysts, and stakeholders) to understand data requirements and deliver robust solutions.
- Monitor and maintain production pipelines, ensuring high availability and quick issue resolution.
Mandatory Skills:
- Hands-on experience with AWS Glue, Glue Catalog tables, Athena, S3, and Lambda.
- Proficiency in PySpark and Python.
- Experience with Apache Kafka for building real-time data pipelines.
- Strong SQL skills for data extraction and transformation.
- Ability to work independently and deliver solutions under tight timelines.
Preferred Skills:
- Experience with Glue Streaming.
- Familiarity with AWS CDK.
Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field.