XIAOYANG FEI

Data Engineer
Boston, USA.

About

Highly accomplished Data Engineer with a proven track record in optimizing ETL processes, leading large-scale cloud migrations, and enhancing data quality for critical business operations. Expert in cloud platforms (AWS, GCP), data warehousing (Redshift, Snowflake), and building robust data pipelines, consistently delivering measurable improvements in performance, efficiency, and compliance. Seeking to leverage advanced data engineering skills to drive impactful data solutions in a dynamic environment.

Work

Doggo Onboard Adventures LLC
|

Data Engineer

Summary

Manages a robust data platform, optimizing ETL processes and leading critical cloud migration initiatives to enhance data availability and performance for over 92K daily cruise bookings.

Highlights

Promoted to Data Engineer after a 2-month internship, now managing a data platform processing 92K+ daily cruise bookings from 5 partners with 94% average uptime.

Led Oracle-to-AWS migration, consolidating 30+ legacy systems into Redshift and achieving $125K in annual savings, delivered through bi-weekly Agile sprints.

Improved data freshness from 2-hour batches to near-real-time using Kafka and Kinesis, optimizing for consistent 5-minute latency by resolving initial partition skew issues.

Enhanced data quality practices by implementing dbt framework (60+ models) and Great Expectations, improving automated anomaly detection from 40% to 85%.

Mentored 2 interns on AWS best practices and code review standards, improving team PR turnaround from 5 days to 2 days.

Doggo Onboard Adventures LLC
|

Data Engineer Intern

Summary

Developed and optimized robust Python ETL frameworks and CI/CD pipelines, significantly improving data processing efficiency and reliability for business intelligence and analytics.

Highlights

Built Python ETL framework processing 50K+ daily records into a dual warehouse architecture (Snowflake, Redshift), implementing error handling to reduce initial 15% failure rate.

Optimized 30+ Oracle PL/SQL procedures by migrating business logic to cloud-native SQL, achieving a 36% performance improvement through query optimization.

Established CI/CD pipeline using GitHub Actions for automated testing and deployment across environments, reducing deployment errors by 80%.

Integrated Oracle Fusion Cloud ERP with S3 data lake, implementing CCPA-compliant data masking and solving API authentication challenges.

Created comprehensive test suite with 20+ dbt tests following SDLC best practices, reducing production incidents from weekly to monthly.

Virginia Tech Transportation Institute
|

Data Analyst

Summary

Analyzed multimodal transportation data, developed robust data pipelines, and created insightful dashboards to support research objectives and enhance simulation feedback.

Highlights

Built a data pipeline on GCP, processing 1.2GB/hour of multimodal transportation data (video, GPS, telemetry) and achieving sub-200ms latency.

Implemented a data lakehouse on Databricks with Delta Lake, managing 850GB+ research data and reducing query time from 8 to 3 hours using Spark optimization.

Automated 42% of video annotation tasks using Vertex AI and pre-trained BERT models, saving 120 research hours monthly.

Collaborated with the Tesla ADAS team, analyzing 850GB+ research data and improving simulation feedback latency by 9.8%, identifying 3 critical safety edge cases.

Created Tableau dashboards for 15+ researchers, bridging the gap between technical data and research insights through weekly training sessions.

Education

Northeastern University

Master's

Data Science

Courses

Advanced Data Modeling

Machine Learning Algorithms

Cloud Computing for Data Science

Virginia Tech

Bachelor's

Data Science

Courses

Data Structures and Algorithms

Statistical Methods for Data Science

Database Management Systems

Certificates

AWS Certified Data Engineer - Associate

Issued By

AWS

MySQL 8.0 Database Admin Professional

Issued By

Oracle

MySQL 8.0 Database Developer Oracle Certified Professional

Issued By

Oracle

Oracle Fusion Cloud Applications ERP Process Essentials Certified

Issued By

Oracle

Skills

Programming

Python (pandas, NumPy, PySpark, boto3, asyncio), SQL (PostgreSQL, MySQL, PL/SQL, NoSQL), Bash, R, Scala.

Data Engineering

Apache Airflow, Apache Kafka, Apache Spark, dbt, Databricks, Great Expectations, Delta Lake, GitHub Actions.

Databases

Redshift, Snowflake, PostgreSQL, Oracle, MySQL, Oracle PL/SQL, BigQuery, MongoDB, DynamoDB, Redis.

Cloud Platforms & Services

AWS (S3, Lambda, Kinesis, Glue, Redshift, EMR, CloudWatch), GCP (BigQuery, Dataflow, Vertex AI), Docker, Kubernetes.

Tools

Git, CI/CD (GitHub Actions, Jenkins), Terraform, JIRA, Tableau, Power BI, Jupyter.

Practices

Agile/Scrum, DataOps, CCPA/GDPR Compliance, Data Quality, SDLC.