About
Highly accomplished Data Engineer with a proven track record in optimizing ETL processes, leading large-scale cloud migrations, and enhancing data quality for critical business operations. Expert in cloud platforms (AWS, GCP), data warehousing (Redshift, Snowflake), and building robust data pipelines, consistently delivering measurable improvements in performance, efficiency, and compliance. Seeking to leverage advanced data engineering skills to drive impactful data solutions in a dynamic environment.
Work
Doggo Onboard Adventures LLC
|Data Engineer
→
Summary
Manages a robust data platform, optimizing ETL processes and leading critical cloud migration initiatives to enhance data availability and performance for over 92K daily cruise bookings.
Highlights
Promoted to Data Engineer after a 2-month internship, now managing a data platform processing 92K+ daily cruise bookings from 5 partners with 94% average uptime.
Led Oracle-to-AWS migration, consolidating 30+ legacy systems into Redshift and achieving $125K in annual savings, delivered through bi-weekly Agile sprints.
Improved data freshness from 2-hour batches to near-real-time using Kafka and Kinesis, optimizing for consistent 5-minute latency by resolving initial partition skew issues.
Enhanced data quality practices by implementing dbt framework (60+ models) and Great Expectations, improving automated anomaly detection from 40% to 85%.
Mentored 2 interns on AWS best practices and code review standards, improving team PR turnaround from 5 days to 2 days.
Doggo Onboard Adventures LLC
|Data Engineer Intern
→
Summary
Developed and optimized robust Python ETL frameworks and CI/CD pipelines, significantly improving data processing efficiency and reliability for business intelligence and analytics.
Highlights
Built Python ETL framework processing 50K+ daily records into a dual warehouse architecture (Snowflake, Redshift), implementing error handling to reduce initial 15% failure rate.
Optimized 30+ Oracle PL/SQL procedures by migrating business logic to cloud-native SQL, achieving a 36% performance improvement through query optimization.
Established CI/CD pipeline using GitHub Actions for automated testing and deployment across environments, reducing deployment errors by 80%.
Integrated Oracle Fusion Cloud ERP with S3 data lake, implementing CCPA-compliant data masking and solving API authentication challenges.
Created comprehensive test suite with 20+ dbt tests following SDLC best practices, reducing production incidents from weekly to monthly.
Virginia Tech Transportation Institute
|Data Analyst
→
Summary
Analyzed multimodal transportation data, developed robust data pipelines, and created insightful dashboards to support research objectives and enhance simulation feedback.
Highlights
Built a data pipeline on GCP, processing 1.2GB/hour of multimodal transportation data (video, GPS, telemetry) and achieving sub-200ms latency.
Implemented a data lakehouse on Databricks with Delta Lake, managing 850GB+ research data and reducing query time from 8 to 3 hours using Spark optimization.
Automated 42% of video annotation tasks using Vertex AI and pre-trained BERT models, saving 120 research hours monthly.
Collaborated with the Tesla ADAS team, analyzing 850GB+ research data and improving simulation feedback latency by 9.8%, identifying 3 critical safety edge cases.
Created Tableau dashboards for 15+ researchers, bridging the gap between technical data and research insights through weekly training sessions.
Education
Northeastern University
→
Master's
Data Science
Courses
Advanced Data Modeling
Machine Learning Algorithms
Cloud Computing for Data Science
Virginia Tech
→
Bachelor's
Data Science
Courses
Data Structures and Algorithms
Statistical Methods for Data Science
Database Management Systems
Certificates
AWS Certified Data Engineer - Associate
Issued By
AWS
MySQL 8.0 Database Admin Professional
Issued By
Oracle
MySQL 8.0 Database Developer Oracle Certified Professional
Issued By
Oracle
Oracle Fusion Cloud Applications ERP Process Essentials Certified
Issued By
Oracle
Skills
Programming
Python (pandas, NumPy, PySpark, boto3, asyncio), SQL (PostgreSQL, MySQL, PL/SQL, NoSQL), Bash, R, Scala.
Data Engineering
Apache Airflow, Apache Kafka, Apache Spark, dbt, Databricks, Great Expectations, Delta Lake, GitHub Actions.
Databases
Redshift, Snowflake, PostgreSQL, Oracle, MySQL, Oracle PL/SQL, BigQuery, MongoDB, DynamoDB, Redis.
Cloud Platforms & Services
AWS (S3, Lambda, Kinesis, Glue, Redshift, EMR, CloudWatch), GCP (BigQuery, Dataflow, Vertex AI), Docker, Kubernetes.
Tools
Git, CI/CD (GitHub Actions, Jenkins), Terraform, JIRA, Tableau, Power BI, Jupyter.
Practices
Agile/Scrum, DataOps, CCPA/GDPR Compliance, Data Quality, SDLC.