Design and develop robust ETL pipelines using Python, PySpark, and GCP services.Build and optimize data models and queries in BigQuery for analytics and reporting.Ingest, transform, and load structured and semi-structured data from various sources.Collaborate with data analysts, scientists, and business teams to understand data requirements.Ensure data quality, integrity, and security across cloud-based data platforms.Monitor and troubleshoot data workflows and performance issues.Automate data validation and transformation processes using scripting and orchestration tools. Required Skills & Qualifications:Hands-on experience with Google Cloud Platform (GCP), especially BigQuery.Strong programming skills in Python and/or PySpark.Experience in designing and implementing ETL workflows and data pipelines.Proficiency in SQL and data modeling for analytics.Familiarity with GCP services such as Cloud Storage, Dataflow, Pub/Sub, and Composer.Understanding of data governance, security, and compliance in cloud environments.Experience with version control (Git) and agile development practices. Hands on experience with Google Cloud Platform GCP especially BigQuery Strong programming skills in Python or PySpark Experience in designing and implementing ETL workflows and data pipelines
With 5 years of experience in Python, PySpark, and SQL, you will have the necessary skills to handle a variety of tasks. You will also have hands-on experience with AWS services, including Glue, EMR, Lambda, S3, EC2, and Redshift. Your work mode will be based out of the Virtusa office, allowing you to collaborate with a team of experts. Your main skills should include Scala, Kafka, PySpark, and AWS Native Data Services, as these are mandatory for the role. Additionally, having knowledge in Big Data will be a nice to have skill that will set you apart from other candidates.