Levan — Lead / Architect Data Engineer

About Me

Turning Data into
Strategic Assets

Data Engineer with 10+ years building and optimizing data warehouses, automated ETL/ELT pipelines, and analytics infrastructure — primarily on AWS and Snowflake. I specialize in setting up data warehouses from scratch, implementing data governance frameworks, and delivering business intelligence solutions that drive decisions. Comfortable owning architecture decisions and working closely with analysts, scientists, and business stakeholders.

LinkedIn Download CV

Key Focus Areas

AWS data platform architecture & automation
Data warehouse design & implementation from scratch
Data modeling — 3NF, Dimensional, Data Vault 2.0
Business intelligence — Power BI, Tableau, Qlik
Data governance & quality frameworks
Team leadership & Agile/Scrum delivery

Industries

Healthcare Insurance Retail Sports Analytics Financial Services Hospitality Agriculture International Development

What I Do

Services

End-to-end data engineering solutions — from architecture design to production deployment.

Data Engineering

Design and build automated ETL/ELT pipelines using PySpark, AWS Glue, Airflow, Kafka, and dbt — batch and real-time, at any scale.

Data Architecture

Design future-proof data architectures — warehouses, lakehouses, and streaming platforms on AWS and Snowflake, built for scale and maintainability.

Data Warehousing

Build data warehouses from scratch — dimensional modeling, Data Vault 2.0, Redshift and Snowflake optimization, automated pipelines, and cost control.

Business Intelligence

Power BI, Tableau, and Qlik implementations — from data modeling and DAX to org-wide self-service BI rollouts, data catalogs, and training.

Data Governance

Data cataloging, lineage, quality frameworks, access controls, PII masking, and ownership models — building trust in your data assets.

AWS Cloud Solutions

Leverage AWS — Redshift, Glue, Athena, Kinesis, Lambda, S3, EMR — to build cost-effective, automated, cloud-native data platforms with Terraform and Terragrunt.

My Process

How I Work

A proven, iterative approach refined over 10+ years of building data platforms.

01

Discovery

Understand your current data landscape, business goals, pain points, and constraints. Audit existing infrastructure, identify quick wins and long-term opportunities.

02

Architecture

Design the target data platform — warehouse schema, pipeline topology, governance model, and tooling choices. Deliver clear documentation and diagrams before writing code.

03

Build

Implement pipelines, warehouse layers, and governance controls iteratively. Ship working increments weekly, with automated tests and CI/CD from day one.

04

Handoff

Knowledge transfer, runbooks, team training, and ongoing support. Your team owns the platform confidently — no vendor lock-in, no black boxes.

Results

Case Studies

Real projects, real outcomes — here's how I've helped organizations transform their data infrastructure.

Sports Analytics

Sports Analytics Platform

Problem

An international sports analytics company needed a scalable data warehouse to support real-time analytics, risk scoring, and fraud detection across multiple business domains.

Approach

Architected and built a real-time AWS Redshift warehouse using Data Vault 2.0 modeling. Implemented Kafka and Kinesis streaming pipelines for live event ingestion, Apache Iceberg for the lakehouse layer, and dbt + Airflow for orchestrated transformations with full lineage tracking.

Result

Delivered a production-grade analytics platform processing millions of events daily with sub-second query performance, full data lineage, and automated quality checks across all pipelines.

Healthcare

Healthcare Data Governance

Problem

A global healthcare company had fragmented clinical and operational data across multiple systems, with no unified governance or modeling framework.

Approach

Designed a data governance and modeling framework unifying clinical and operational domains. Built an IaC-based AWS platform (Glue, PySpark, Lambda, Athena) with Terraform, standardized data products with STTM and ownership maps, and piloted a governed Databricks lakehouse with Unity Catalog.

Result

Unified data across domains with clear ownership, standardized documentation, and a governed lakehouse — reducing data silos and enabling cross-functional analytics for the first time.

Retail

Retail Analytics Warehouse

Problem

A major retail chain had no centralized analytics capability — reporting was manual, slow, and inconsistent across departments.

Approach

Built the analytical data warehouse from scratch on MS SQL and Python, processing billion-row fact tables. Engineered aggregation models for sub-second queries, deployed Power BI organization-wide with self-service BI, data catalog, and governance policies.

Result

Transformed the organization from spreadsheet-based reporting to a self-service BI platform. Billion-row queries running in under a second, Power BI adopted across all departments.

Insurance

Insurance Data Platform

Problem

A large insurance technology company needed to consolidate 10+ TB of data from legacy systems into a modern cloud warehouse with reliable, tested transformation pipelines.

Approach

Led the Snowflake warehouse build (10+ TB), designed multi-cloud ELT pipelines with dbt and CI/CD workflows. Implemented SnowPark with Python UDFs for complex business logic, automated reporting with Power BI and Tableau.

Result

Delivered a 10+ TB Snowflake warehouse with version-controlled dbt transformations, 50% improvement in processing efficiency via SnowPark, and automated reporting that reduced manual effort from days to hours.

Tech Stack

Expertise

Tools and technologies I work with to deliver robust data solutions.

AWS Platform

Redshift Glue Athena Kinesis / Firehose Lambda S3 EMR Step Functions

Data Engineering

Apache Airflow Dagster Apache Spark / PySpark Kafka dbt Apache Iceberg Databricks SSIS

Warehousing & Modeling

Snowflake Data Vault 2.0 Dimensional Modeling 3NF STTM

Infrastructure & CI/CD

Terraform Terragrunt Docker Git / GitHub Jenkins

BI & Visualization

Power BI DAX Tableau Qlik

Governance & Languages

Data Cataloging Data Lineage Quality Frameworks PII Masking Python SQL

Career

Experience

Lead Data Engineer

Sports Analytics Platform

Oct 2025 — Present

Leading architecture and development of a scalable AWS Redshift data warehouse for an international sports analytics platform
Designing and implementing Data Vault 2.0 models to support flexible analytical and reporting workloads across multiple business domains
Building real-time streaming pipelines with Kafka and AWS Kinesis for live analytics, risk scoring, and fraud detection
Implementing Apache Iceberg for efficient table management, time-travel queries, and schema evolution on the data lakehouse layer
Orchestrating transformation workflows with dbt, Apache Airflow, and Dagster — ensuring data quality, full lineage tracking, and observable pipelines
Building and optimizing ETL/ELT pipelines using PySpark, AWS Glue, Lambda, and Athena for high-volume event and transactional data
Implementing infrastructure as code (Terraform, Terragrunt) for reproducible, automated pipeline deployment
Establishing data governance standards, quality checks, and monitoring across all pipelines

Freelance / Contract Data Engineer

Healthcare & Cloud Platforms

Sep 2024 — Sep 2025

Designed data governance and modeling framework unifying clinical and operational domains for a global healthcare company
Built IaC-based data platform on AWS (Glue, PySpark, Lambda, Athena) with Terraform and Terragrunt
Standardized data products and documentation — STTM, ownership maps, runbooks
Architected warehouses on Snowflake and PostgreSQL; refactored legacy Python codebases for maintainability
Piloted a governed lakehouse on Databricks with Unity Catalog

Senior Data Warehouse Engineer

Global Hospitality Platform

Jul 2024 — Sep 2024

Owned end-to-end Redshift infrastructure; tuned queries and distribution keys, improving performance by 3x
Maintained real-time ingestion with AWS Kinesis and Firehose, processing millions of events daily
Built serverless ETL with AWS Glue and Athena, reducing infrastructure costs by 30%

Data Lead

Insurance Technology (via EPAM Systems)

Jan 2022 — May 2024

Led large-scale data warehouse build on Snowflake (10+ TB), designing multi-cloud ELT pipelines
Implemented dbt with version control and CI/CD workflows for the transformation layer
Set up SnowPark with Python UDFs, improving processing efficiency by 50%
Automated reporting with Python, SQL, Power BI, and Tableau — reducing manual effort from days to hours

Lead BI/BA

Logistics Services (via EPAM Systems)

Apr 2021 — Jan 2022

Developed data warehouse architecture on BigQuery supporting complex analytical queries
Created data stewardship framework defining data products, ownership, and quality standards
Automated data validation and reconciliation processes using Python

Big Data Unit Head

Major Retail Chain

Jun 2019 — Apr 2021

Built the analytical data warehouse from scratch (MS SQL, Python); processed billion-row fact sets
Led a data engineering team using Agile/Scrum; managed sprint planning and deliverables
Deployed Power BI across the organization: self-service BI, data catalog, and governance policies
Engineered aggregation models enabling sub-second queries on billion-row tables

Policy Advisor

FAO (United Nations)

Jan 2014 — Feb 2019

Led the conceptual design and rollout of the national Ministry of Agriculture data warehouse
Designed and delivered a Market Information System (price collection, validation, publication)
Authored governance artifacts: process models, ownership matrices, data standards

Lead Specialist

National Bank of Georgia

Jul 2009 — Jan 2014

Operated and configured treasury & risk management accounting module
Designed and automated daily reconciliations between system modules and general ledger
Core team member for national treasury management platform implementation

Insights

Blog

Opinions and lessons from building data platforms in the real world.

2026-02-20 14 min read

AI Agent Development Protocol: A Practical Guide

AI coding agents are powerful — until they hallucinate a file that doesn't exist, refactor code they weren't asked to touch, or force-push to main. The AI Agent Development Protocol (AADP) is an open-source framework I built to ground agents in real project context, enforce safety handshakes, and maintain engineering standards. This is a step-by-step guide to setting it up.

Read article →

2026-02-19 13 min read

Snowflake vs Redshift in 2026: An Honest Comparison

After running both Snowflake and Redshift in production across multiple companies and workloads, I have strong opinions about where each shines and where each falls short. This is not a feature checklist — it is an honest, opinionated comparison from the trenches.

Read article →

2026-02-19 15 min read

How to Build a Data Warehouse from Scratch on AWS

Building a data warehouse is one of the most consequential decisions a data team makes. After architecting warehouses from scratch at multiple companies on AWS, here is the battle-tested playbook I follow every time -- from requirements gathering through governance -- and the mistakes I have learned to avoid.

Read article →

2026-02-19 14 min read

DuckDB: The Universal Query Engine Data Engineers Actually Need

DuckDB isn't just another database — it's a universal query engine that lets you JOIN a Postgres table with a Parquet file on S3 and an Iceberg table in a single SQL query. With 25 million monthly PyPI downloads and the new 1.4 LTS release, it's reshaping how data engineers think about infrastructure.

Read article →

2026-02-19 14 min read

Data Vault 2.0 in Practice: Lessons from Real Implementations

Data Vault 2.0 promises auditability, agility, and scalability for enterprise data warehouses. After implementing it across multiple organizations, I can tell you what the methodology gets right, where it falls short, and the hard-won patterns that make the difference between a clean vault and a mess.

Read article →

2026-02-19 12 min read

Data Governance That Actually Works: A Practical Framework

Most data governance programs die within a year — strangled by bureaucracy, ignored by engineers, and abandoned by business stakeholders who never saw value. After implementing governance across multiple data platforms, I've distilled what actually works into a 5-pillar framework that engineers will adopt and business teams will champion.

Read article →

2026-02-19 12 min read

Dagster vs Airflow: And Why dbt Belongs with Dagster

After years of running both Airflow and Dagster in production, I've formed a clear opinion: if your stack includes dbt, Dagster isn't just a better orchestrator — it's a fundamentally better fit. Here's why the asset-centric model changes everything.

Read article →

Projects

Sites

Live data engineering projects — built, deployed, and running in production.

EconGE

Live

Georgian Economic Data Pipeline

An automated pipeline collecting, processing, and publishing economic data from Georgia's official statistical sources.

Dagster Python PostgreSQL FastAPI

Common Questions

FAQ

Get in Touch

Let's Build Something Together

Looking for a data engineering partner? Whether you need a data warehouse built from scratch, BI implementation, governance framework, or strategic consulting — I'd love to hear about your project.

Direct Contact

info@levanalibegashvili.com

Rates & Availability

$50 — $70 / hour (net)

Flexible working hours

Location

Tbilisi, Georgia

Available for remote work worldwide

Levan.

Lead / Architect Data Engineer

Turning Data intoStrategic Assets

Key Focus Areas

Industries

Services

Data Engineering

Data Architecture

Data Warehousing

Business Intelligence

Data Governance

AWS Cloud Solutions

How I Work

Discovery

Architecture

Build

Handoff

Case Studies

Sports Analytics Platform

Healthcare Data Governance

Retail Analytics Warehouse

Insurance Data Platform

Expertise

AWS Platform

Data Engineering

Warehousing & Modeling

Infrastructure & CI/CD

BI & Visualization

Governance & Languages

Experience

Lead Data Engineer

Freelance / Contract Data Engineer

Senior Data Warehouse Engineer

Data Lead

Lead BI/BA

Big Data Unit Head

Policy Advisor

Lead Specialist

Blog

AI Agent Development Protocol: A Practical Guide

Snowflake vs Redshift in 2026: An Honest Comparison

How to Build a Data Warehouse from Scratch on AWS

DuckDB: The Universal Query Engine Data Engineers Actually Need

Data Vault 2.0 in Practice: Lessons from Real Implementations

Data Governance That Actually Works: A Practical Framework

Dagster vs Airflow: And Why dbt Belongs with Dagster

Sites

EconGE

FAQ

Let's Build Something Together

Direct Contact

Rates & Availability

Location

Turning Data into
Strategic Assets