Journey

through the Cloud Data World

Patrick Favre

Techie Lunch • 2024

Agenda

Data Architecture Recap
01
"New Age": Data Meshes and Data Producers
02
Solution: Data Lakehouse Architecture in AWS
03
Tidbits & Learning
04

Data Architecture: A Recap

1980s: Relational DW

Centralized, monolithic OLTP databases. Simple but difficult to scale performance.

2010: Data Lake

Object storage for unstructured data. Scales storage/compute independently; hard to manage.

2011+: Mashups

Introduction of Data Fabric and Modern Data Warehouse architectures.

2020: Lakehouse

Relational abstraction over object stores. Adoption of Delta Lake and Iceberg.

2019: Data Mesh

Decentralized Domain Driven Design approach to organizational data.

Data Architectures: Basic Concepts

Data Catalogs

Central store for metadata (types, location, compliance, governance, discoverability).

Batch vs Streaming

  • Batch: Scheduled ETL jobs (e.g., nightly) for bulk processing.
  • Streaming: Real-time ingestion and processing of operational changes.

Data Lake Layers

Landing (Bronze)

Raw copy from source; unchanged form.

Transformed (Silver)

Normalized, partitioned, and deduplicated.

Curated (Gold)

Business logic applied; ready for consumption.

"Modern" Ab Initio: Ready for the Cloud?

What is it?

  • High Volume: Software for batch & streaming data applications.
  • Full Ecosystem: GUI-based development (GDE, MetaData Hub).
  • "Secretive": Low public profile but used by giants like Sony & Lidl.
  • Old-school: Traditionally lags behind modern dev & infra practices.

Cloud Migration

  • Containerization: Self-contained deployables with CD flow.
  • Kubernetes: Dispatchers implemented for parallel cluster execution.
  • Challenge: Achieving true serverless behavior for on-demand scaling and cost efficiency.

New Age Data: Data Meshes and Data Products

Data Mesh (DDD)

Decentralized approach to data architectures.

Domain Ownership
Data as a Product
Self-Serve Platform
Fed. Governance
The Reality: Low adoption due to organizational complexity and high technical bar. Often carries antagonistic connotations.

Data Product

The core concept to be extracted from Data Mesh.

"Focus on the value delivered, not just the pipeline."

Recommended Reading

See latest CTO posts on internal portal.

Data Lakehouse in AWS

Tidbits and Learnings

Organization & Skills

BI / Analytics departments often have lower average tech skill due to reliance on standard software.

Hyperscaler Ecosystem

There is a massive, complex ecosystem of data products across major hyperscaler portfolios.

Governance & Structure

  • Banking data requires strict central management.
  • Data Lakes are notoriously difficult to keep structured.

Future Outlook

Wave of Data Warehouse → "Something Modern" transformations incoming.

Thank You!

Any Questions?