Sr Data Engineer + PySpark, Python
We are seeking a highly experienced and motivated Senior Data Engineer to join our team. The ideal candidate will have extensive experience in designing, building, and optimizing highly scalable and robust ETL/ELT pipelines. This role will be critical in shaping our data architecture, implementing Lakehouse solutions, and working with cutting-edge technologies like LLMs and Vector Search.
Key Responsibilities:
Design, develop, and maintain robust and scalable data pipelines using PySpark and Python.
Implement and manage data solutions within the Databricks platform.
Define and enforce data modeling standards, specifically utilizing the Medallion architecture (Bronze, Silver, Gold layers).
Architect and implement Lakehouse capabilities, including AI/Machine Learning features and Vector Search for advanced data retrieval.
Evaluate, integrate, and work with Large Language Model (LLM) frameworks.
Collaborate with data scientists and business stakeholders to understand data requirements and translate them into technical solutions.
Ensure data quality, reliability, and security throughout the data lifecycle.
Mentor junior engineers and contribute to best practices in data engineering.
Required Skills and Qualifications:
Experience: 10+ years of professional experience in data engineering or a related field.
Programming: Expert proficiency in PySpark and Python for data manipulation and pipeline development.
Platform: Deep expertise with the Databricks platform.
Architecture: Proven experience designing and implementing data models, particularly the Medallion architecture.
Modern Data Stack: Experience with Lakehouse concepts, architecture, and implementation.
AI/ML Integration: Hands-on experience with AI/Vector Search technologies and working with LLM frameworks.
Preferred Skills (Nice to Have):
Experience with AI Agent frameworks (e.g., Devin, LangChain, or similar).
Solid understanding and experience with Prompt Engineering.
Knowledge of and experience implementing LLMOps best practices.
Familiarity with legacy ETL tools such as Ab Initio.