This site is under active development — some features may be incomplete

Hello, I'm

Levan.

Lead / Architect Data Engineer

Freelance Data Engineer & Architect with 10+ years building data warehouses from scratch, automating ETL/ELT pipelines on AWS and Snowflake, and delivering data governance and BI solutions for healthcare, insurance, retail, and sports analytics.

About Me

Turning Data into
Strategic Assets

Data Engineer with 10+ years building and optimizing data warehouses, automated ETL/ELT pipelines, and analytics infrastructure — primarily on AWS and Snowflake. I specialize in setting up data warehouses from scratch, implementing data governance frameworks, and delivering business intelligence solutions that drive decisions. Comfortable owning architecture decisions and working closely with analysts, scientists, and business stakeholders.

Key Focus Areas

  • AWS data platform architecture & automation
  • Data warehouse design & implementation from scratch
  • Data modeling — 3NF, Dimensional, Data Vault 2.0
  • Business intelligence — Power BI, Tableau, Qlik
  • Data governance & quality frameworks
  • Team leadership & Agile/Scrum delivery

Industries

Healthcare Insurance Retail Sports Analytics Financial Services Hospitality Agriculture International Development
10+
Years Experience
8+
Industries Served
TB-Scale
Warehouses Built
Worldwide
Available Remotely

What I Do

Services

End-to-end data engineering solutions — from architecture design to production deployment.

Data Engineering

Design and build automated ETL/ELT pipelines using PySpark, AWS Glue, Airflow, Kafka, and dbt — batch and real-time, at any scale.

Data Architecture

Design future-proof data architectures — warehouses, lakehouses, and streaming platforms on AWS and Snowflake, built for scale and maintainability.

Data Warehousing

Build data warehouses from scratch — dimensional modeling, Data Vault 2.0, Redshift and Snowflake optimization, automated pipelines, and cost control.

Business Intelligence

Power BI, Tableau, and Qlik implementations — from data modeling and DAX to org-wide self-service BI rollouts, data catalogs, and training.

Data Governance

Data cataloging, lineage, quality frameworks, access controls, PII masking, and ownership models — building trust in your data assets.

AWS Cloud Solutions

Leverage AWS — Redshift, Glue, Athena, Kinesis, Lambda, S3, EMR — to build cost-effective, automated, cloud-native data platforms with Terraform and Terragrunt.

My Process

How I Work

A proven, iterative approach refined over 10+ years of building data platforms.

01

Discovery

Understand your current data landscape, business goals, pain points, and constraints. Audit existing infrastructure, identify quick wins and long-term opportunities.

02

Architecture

Design the target data platform — warehouse schema, pipeline topology, governance model, and tooling choices. Deliver clear documentation and diagrams before writing code.

03

Build

Implement pipelines, warehouse layers, and governance controls iteratively. Ship working increments weekly, with automated tests and CI/CD from day one.

04

Handoff

Knowledge transfer, runbooks, team training, and ongoing support. Your team owns the platform confidently — no vendor lock-in, no black boxes.

Results

Case Studies

Real projects, real outcomes — here's how I've helped organizations transform their data infrastructure.

Sports Analytics

Sports Analytics Platform

Problem

An international sports analytics company needed a scalable data warehouse to support real-time analytics, risk scoring, and fraud detection across multiple business domains.

Approach

Architected and built a real-time AWS Redshift warehouse using Data Vault 2.0 modeling. Implemented Kafka and Kinesis streaming pipelines for live event ingestion, Apache Iceberg for the lakehouse layer, and dbt + Airflow for orchestrated transformations with full lineage tracking.

Result

Delivered a production-grade analytics platform processing millions of events daily with sub-second query performance, full data lineage, and automated quality checks across all pipelines.

Healthcare

Healthcare Data Governance

Problem

A global healthcare company had fragmented clinical and operational data across multiple systems, with no unified governance or modeling framework.

Approach

Designed a data governance and modeling framework unifying clinical and operational domains. Built an IaC-based AWS platform (Glue, PySpark, Lambda, Athena) with Terraform, standardized data products with STTM and ownership maps, and piloted a governed Databricks lakehouse with Unity Catalog.

Result

Unified data across domains with clear ownership, standardized documentation, and a governed lakehouse — reducing data silos and enabling cross-functional analytics for the first time.

Retail

Retail Analytics Warehouse

Problem

A major retail chain had no centralized analytics capability — reporting was manual, slow, and inconsistent across departments.

Approach

Built the analytical data warehouse from scratch on MS SQL and Python, processing billion-row fact tables. Engineered aggregation models for sub-second queries, deployed Power BI organization-wide with self-service BI, data catalog, and governance policies.

Result

Transformed the organization from spreadsheet-based reporting to a self-service BI platform. Billion-row queries running in under a second, Power BI adopted across all departments.

Insurance

Insurance Data Platform

Problem

A large insurance technology company needed to consolidate 10+ TB of data from legacy systems into a modern cloud warehouse with reliable, tested transformation pipelines.

Approach

Led the Snowflake warehouse build (10+ TB), designed multi-cloud ELT pipelines with dbt and CI/CD workflows. Implemented SnowPark with Python UDFs for complex business logic, automated reporting with Power BI and Tableau.

Result

Delivered a 10+ TB Snowflake warehouse with version-controlled dbt transformations, 50% improvement in processing efficiency via SnowPark, and automated reporting that reduced manual effort from days to hours.

Tech Stack

Expertise

Tools and technologies I work with to deliver robust data solutions.

AWS Platform

Redshift Glue Athena Kinesis / Firehose Lambda S3 EMR Step Functions

Data Engineering

Apache Airflow Dagster Apache Spark / PySpark Kafka dbt Apache Iceberg Databricks SSIS

Warehousing & Modeling

Snowflake Data Vault 2.0 Dimensional Modeling 3NF STTM

Infrastructure & CI/CD

Terraform Terragrunt Docker Git / GitHub Jenkins

BI & Visualization

Power BI DAX Tableau Qlik

Governance & Languages

Data Cataloging Data Lineage Quality Frameworks PII Masking Python SQL

Career

Experience

Lead Data Engineer

Sports Analytics Platform

Oct 2025 — Present

  • Leading architecture and development of a scalable AWS Redshift data warehouse for an international sports analytics platform
  • Designing and implementing Data Vault 2.0 models to support flexible analytical and reporting workloads across multiple business domains
  • Building real-time streaming pipelines with Kafka and AWS Kinesis for live analytics, risk scoring, and fraud detection
  • Implementing Apache Iceberg for efficient table management, time-travel queries, and schema evolution on the data lakehouse layer
  • Orchestrating transformation workflows with dbt, Apache Airflow, and Dagster — ensuring data quality, full lineage tracking, and observable pipelines
  • Building and optimizing ETL/ELT pipelines using PySpark, AWS Glue, Lambda, and Athena for high-volume event and transactional data
  • Implementing infrastructure as code (Terraform, Terragrunt) for reproducible, automated pipeline deployment
  • Establishing data governance standards, quality checks, and monitoring across all pipelines

Freelance / Contract Data Engineer

Healthcare & Cloud Platforms

Sep 2024 — Sep 2025

  • Designed data governance and modeling framework unifying clinical and operational domains for a global healthcare company
  • Built IaC-based data platform on AWS (Glue, PySpark, Lambda, Athena) with Terraform and Terragrunt
  • Standardized data products and documentation — STTM, ownership maps, runbooks
  • Architected warehouses on Snowflake and PostgreSQL; refactored legacy Python codebases for maintainability
  • Piloted a governed lakehouse on Databricks with Unity Catalog

Senior Data Warehouse Engineer

Global Hospitality Platform

Jul 2024 — Sep 2024

  • Owned end-to-end Redshift infrastructure; tuned queries and distribution keys, improving performance by 3x
  • Maintained real-time ingestion with AWS Kinesis and Firehose, processing millions of events daily
  • Built serverless ETL with AWS Glue and Athena, reducing infrastructure costs by 30%

Data Lead

Insurance Technology (via EPAM Systems)

Jan 2022 — May 2024

  • Led large-scale data warehouse build on Snowflake (10+ TB), designing multi-cloud ELT pipelines
  • Implemented dbt with version control and CI/CD workflows for the transformation layer
  • Set up SnowPark with Python UDFs, improving processing efficiency by 50%
  • Automated reporting with Python, SQL, Power BI, and Tableau — reducing manual effort from days to hours

Lead BI/BA

Logistics Services (via EPAM Systems)

Apr 2021 — Jan 2022

  • Developed data warehouse architecture on BigQuery supporting complex analytical queries
  • Created data stewardship framework defining data products, ownership, and quality standards
  • Automated data validation and reconciliation processes using Python

Big Data Unit Head

Major Retail Chain

Jun 2019 — Apr 2021

  • Built the analytical data warehouse from scratch (MS SQL, Python); processed billion-row fact sets
  • Led a data engineering team using Agile/Scrum; managed sprint planning and deliverables
  • Deployed Power BI across the organization: self-service BI, data catalog, and governance policies
  • Engineered aggregation models enabling sub-second queries on billion-row tables

Policy Advisor

FAO (United Nations)

Jan 2014 — Feb 2019

  • Led the conceptual design and rollout of the national Ministry of Agriculture data warehouse
  • Designed and delivered a Market Information System (price collection, validation, publication)
  • Authored governance artifacts: process models, ownership matrices, data standards

Lead Specialist

National Bank of Georgia

Jul 2009 — Jan 2014

  • Operated and configured treasury & risk management accounting module
  • Designed and automated daily reconciliations between system modules and general ledger
  • Core team member for national treasury management platform implementation

Insights

Blog

Opinions and lessons from building data platforms in the real world.

2026-02-20 14 min read

AI Agent Development Protocol: A Practical Guide

AI coding agents are powerful — until they hallucinate a file that doesn't exist, refactor code they weren't asked to touch, or force-push to main. The AI Agent Development Protocol (AADP) is an open-source framework I built to ground agents in real project context, enforce safety handshakes, and maintain engineering standards. This is a step-by-step guide to setting it up.

Read article →
2026-02-19 13 min read

Snowflake vs Redshift in 2026: An Honest Comparison

After running both Snowflake and Redshift in production across multiple companies and workloads, I have strong opinions about where each shines and where each falls short. This is not a feature checklist — it is an honest, opinionated comparison from the trenches.

Read article →
2026-02-19 15 min read

How to Build a Data Warehouse from Scratch on AWS

Building a data warehouse is one of the most consequential decisions a data team makes. After architecting warehouses from scratch at multiple companies on AWS, here is the battle-tested playbook I follow every time -- from requirements gathering through governance -- and the mistakes I have learned to avoid.

Read article →
2026-02-19 14 min read

DuckDB: The Universal Query Engine Data Engineers Actually Need

DuckDB isn't just another database — it's a universal query engine that lets you JOIN a Postgres table with a Parquet file on S3 and an Iceberg table in a single SQL query. With 25 million monthly PyPI downloads and the new 1.4 LTS release, it's reshaping how data engineers think about infrastructure.

Read article →
2026-02-19 14 min read

Data Vault 2.0 in Practice: Lessons from Real Implementations

Data Vault 2.0 promises auditability, agility, and scalability for enterprise data warehouses. After implementing it across multiple organizations, I can tell you what the methodology gets right, where it falls short, and the hard-won patterns that make the difference between a clean vault and a mess.

Read article →
2026-02-19 12 min read

Data Governance That Actually Works: A Practical Framework

Most data governance programs die within a year — strangled by bureaucracy, ignored by engineers, and abandoned by business stakeholders who never saw value. After implementing governance across multiple data platforms, I've distilled what actually works into a 5-pillar framework that engineers will adopt and business teams will champion.

Read article →
2026-02-19 12 min read

Dagster vs Airflow: And Why dbt Belongs with Dagster

After years of running both Airflow and Dagster in production, I've formed a clear opinion: if your stack includes dbt, Dagster isn't just a better orchestrator — it's a fundamentally better fit. Here's why the asset-centric model changes everything.

Read article →

Common Questions

FAQ

Get in Touch

Let's Build Something Together

Looking for a data engineering partner? Whether you need a data warehouse built from scratch, BI implementation, governance framework, or strategic consulting — I'd love to hear about your project.

Rates & Availability

$50 — $70 / hour (net)

Flexible working hours

Location

Tbilisi, Georgia

Available for remote work worldwide