Cloud Data Architect
Define and own enterprise data architecture patterns: lakehouse, data warehouse, data lake, data vault and dimensional models, aligned to business needs and regulatory requirements.
Create reference architectures for batch, streaming, and transactional pipelines using Databricks (DLT, Autoloader, Unity Catalog, SQL Warehouse).
Lead modernization/migration programs: Exadata/Oracle/Informatica/Hadoop/SAS → Spark/Databricks, Redshift/Snowflake/BigQuery.
Platform & Engineering
Architect multi‑cloud data platforms leveraging:
AWS: S3, Glue, Lambda, MSK/Kinesis, EMR, Redshift, Step Functions, MWAA.
Azure: ADF, ADLS, Functions, Event Hub, Azure SQL, Logic Apps, Stream Analytics.
GCP: Dataflow, Dataproc, BigQuery, Composer, GCS, Cloud Functions, Vertex AI.
Establish performance optimization guidelines for PySpark/Spark: memory tuning, shuffle/partition strategies, UDF optimization, RAPIDS/GPU acceleration.
Design event‑driven ingestion and CDC architectures (GoldenGate, Kafka/MSK, Kinesis, Glue/Airflow operators).
Governance, Security & Reliability
Implement data governance: Unity Catalog, access controls, lineage, PII handling, encryption/decryption in transit and at rest.
Define observability & reliability standards: data quality (DQ), schema evolution, incident management, SLAs/SLOs, cost guardrails.
DevOps & Orchestration
Drive CI/CD for data pipelines using Terraform, Jenkins, Databricks Asset Bundles, TypeScript CDK; standardize environments, promotion flows, secrets management.
Standardize Airflow/MWAA/Composer orchestration with reusable operators and DAG patterns.