Pyspark Data Engineer
·        Minimum 5+ years of development  and design experience in experience as Data Engineer
·        Experience on Big Data platforms and distributed computing (e.g. Hadoop, Map/Reduce, Spark, HBase, Hive)
·        Experience in data pipeline software engineering and best practice in python (linting, unit tests, integration tests, git flow/pull request process, object-oriented development, data validation, algorithms and data structures, technical troubleshooting and debugging, bash scripting )
·        Experience in Data Quality Assessment (profiling, anomaly detection) and data documentation (schema, dictionaries)
·        Experience in data architecture, data warehousing and modelling techniques (Relational, ETL, OLTP) and consider performance alternatives
·        Used SQL, PL/SQL or T-SQL with RDBMSs production environments, no-SQL databases nice to have
·        Linux OS configuration and use, including shell scripting.
·        Well versed with Agile, DevOps and CI/CD principles (GitHub, Jenkins etc.), and actively involved in solving, troubleshooting issues in distributed services ecosystem
