Data

Understanding Data Lakehouse with Apache Iceberg, Apache Hudi & Delta Lake

There is a lot of buzz around data lakehouse architecture today, which unifies two mainstream data architectures - data warehouse & data lakes - promising to do more with less. On the other hand, all major data warehouse vendors have embraced the use of open table formats, due to customer demand for the flexibility & openness promised by supporting an open format.

Three projects - Apache Iceberg, Apache Hudi, and Delta Lake - are now at the center of all the attention and vendor chess moves in this space. These projects are pivotal in forging an open, adaptable foundation for your data that allows enterprises to choose appropriate compute specific to their unique workloads, thus avoiding the constraints of proprietary storage formats. However, the increasing usage of the terms open table format & open data lakehouse, used interchangeably across these projects, necessitates clarification and a deeper understanding.

In this session, we will do a technical breakdown of the lakehouse architecture (with code) & understand what actually brings openness.

20 Mar

10:10 am

10:20 am PST

Add to Calendar

Speaker

Dipankar Mazumdar

Dipankar is currently a Staff Data Advocate at Onehouse, where he focuses on open-source projects such as Apache Hudi and XTable to help engineering teams build and scale robust analytics platforms. Before this, he worked on critical open-source projects such as Apache Iceberg and Arrow. For most of his career, Dipankar worked at the intersection of Data Engineering and Machine Learning. He is currently also authoring the book "Engineering Lakehouses using Open table Formats". Dipankar has been a speaker at numerous conferences such as Data+AI, ApacheCon, Scale By the Bay, Data Day Texas among others.