Pyspark Data Engineer
· Minimum 5+ years of development and design experience in experience as Data Engineer
· Experience on Big Data platforms and distributed computing (e.g. Hadoop, Map/Reduce, Spark, HBase, Hive)
· Experience in data pipeline software engineering and best practice in python (linting, unit tests, integration tests, git flow/pull request process, object-oriented development, data validation, algorithms and data structures, technical troubleshooting and debugging, bash scripting )
· Experience in Data Quality Assessment (profiling, anomaly detection) and data documentation (schema, dictionaries)
· Experience in data architecture, data warehousing and modelling techniques (Relational, ETL, OLTP) and consider performance alternatives
· Used SQL, PL/SQL or T-SQL with RDBMSs production environments, no-SQL databases nice to have
· Linux OS configuration and use, including shell scripting.
· Well versed with Agile, DevOps and CI/CD principles (GitHub, Jenkins etc.), and actively involved in solving, troubleshooting issues in distributed services ecosystem