Databricks + PySpark

🔍 Hyderabad, Andhra Pradesh, India

New

📁: Lead Software Engineer

📅   : CREQ213662 Requisition #

📅   : 1 day ago Post Date

Data Pipeline Development: Design, implement, and maintain scalable and efficient data pipelines using PySpark and Databricks for ETL processing of large volumes of data.
Cloud Integration: Develop solutions leveraging Databricks on cloud platforms (AWS/Azure/GCP) to process and analyze data in a distributed computing environment.
Data Modeling: Build robust data models, ensuring high-quality data integration and consistency across multiple data sources.
Optimization: Optimize PySpark jobs for performance, ensuring the efficient use of resources and cost-effective execution.
Collaborative Development: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver actionable insights.
Automation & Monitoring: Implement monitoring solutions for data pipeline health, performance, and failure detection.
Documentation & Best Practices: Maintain comprehensive documentation of architecture, design, and code. Ensure adherence to best practices for data engineering, version control, and CI/CD processes.
Mentorship: Provide guidance to junior data engineers and help with the design and implementation of new features and components.
________________________________________
Required Skills & Qualifications:
Experience: 6+ years of experience in data engineering or software engineering roles, with a strong focus on PySpark and Databricks.
Technical Skills:
Proficient in PySpark for distributed data processing and ETL pipelines.
Experience working with Databricks for running Apache Spark workloads in a cloud environment.
Solid knowledge of SQL, data wrangling, and data manipulation.
Experience with cloud platforms (AWS, Azure, or GCP) and their respective data storage services (S3, ADLS, BigQuery, etc.).
Familiarity with data lakes, data warehouses, and NoSQL databases (e.g., MongoDB, Cassandra, HBase).
Experience with orchestration tools like Apache Airflow, Azure Data Factory, or DBT.
Familiarity with containerization (Docker, Kubernetes) and DevOps practices.
Problem Solving: Strong ability to troubleshoot and debug issues related to distributed computing, performance bottlenecks, and data quality.
Version Control: Proficient in Git based workflows and version control.
Communication Skills: Excellent written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders.
Education: Bachelor or Masters degree in Computer Science, Engineering, or a related field (or equivalent practical experience).

Detailed Job Description for Databricks + PySpark Developer:

· Data Pipeline Development: Design, implement, and maintain scalable and efficient data pipelines using PySpark and Databricks for ETL processing of large volumes of data.

· Cloud Integration: Develop solutions leveraging Databricks on cloud platforms (AWS/Azure/GCP) to process and analyze data in a distributed computing environment.

· Data Modeling: Build robust data models, ensuring high-quality data integration and consistency across multiple data sources.

· Optimization: Optimize PySpark jobs for performance, ensuring the efficient use of resources and cost-effective execution.

· Collaborative Development: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver actionable insights.

· Automation & Monitoring: Implement monitoring solutions for data pipeline health, performance, and failure detection.

· Documentation & Best Practices: Maintain comprehensive documentation of architecture, design, and code. Ensure adherence to best practices for data engineering, version control, and CI/CD processes.

· Mentorship: Provide guidance to junior data engineers and help with the design and implementation of new features and components.

Required Skills & Qualifications:

· Experience: 6+ years of experience in data engineering or software engineering roles, with a strong focus on PySpark and Databricks.

Technical Skills:

· Proficient in PySpark for distributed data processing and ETL pipelines.

· Experience working with Databricks for running Apache Spark workloads in a cloud environment.

· Solid knowledge of SQL, data wrangling, and data manipulation.

· Experience with cloud platforms (AWS, Azure, or GCP) and their respective data storage services (S3, ADLS, BigQuery, etc.).

· Familiarity with data lakes, data warehouses, and NoSQL databases (e.g., MongoDB, Cassandra, HBase).

· Experience with orchestration tools like Apache Airflow, Azure Data Factory, or DBT.

· Familiarity with containerization (Docker, Kubernetes) and DevOps practices.

· Problem Solving: Strong ability to troubleshoot and debug issues related to distributed computing, performance bottlenecks, and data quality.

· Version Control: Proficient in Git based workflows and version control.

· Communication Skills: Excellent written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders.

· Education: Bachelor or Master’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).

Databricks + PySpark

🔍 Hyderabad, Andhra Pradesh, India

Previous Job Searches

Similar Listings