AI Ready Data with Apache Iceberg: Unifying, Controlling, and Optimizing Your Data for Effective AI

In today's data-driven world, the effectiveness of Artificial Intelligence (AI) and Machine Learning (ML) models depends heavily on the quality and organization of your underlying data. "AI Ready Data with Apache Iceberg" addresses this challenge and describes how Apache Iceberg can facilitate unifying, governing, and optimizing your data, making it truly AI ready.

Key Takeaways:
The Data Lakehouse Advantage:
Explain how Apache Iceberg, combined with the lakehouse architecture, provides a unified platform for all types of data, breaking down silos and simplifying data management.

Git-Like Data Governance with Nessie:
Introduce Nessie and demonstrate how its Git-like functionality brings version control, branching, and collaboration to your data, enabling efficient experimentation and ensuring data reproducibility.

Data Contracts for Quality Assurance:
Discuss the concept of data contracts and how they can be used to define and enforce quality standards, ensuring that data meets the necessary criteria for AI/ML workloads.

Iceberg's Optimized Data Structures:
Highlight how Iceberg's optimized data layouts (e.g., columnar formats, partitioning, hidden partitioning) improve query performance and resource utilization, leading to faster AI/ML model training and inference.

Real-World Use Cases:
Share examples of how organizations are using Iceberg, Nessie, and data contracts to build robust data pipelines, enhance data quality, and achieve tangible results with their AI initiatives.

20 Mar

10:40 am

10:50 am PST

Add to Calendar

Speaker

Andrew Madson

Andrew Madson is a Data Analytics, Data Science, and AI Evangelist at Tobiko Data, where he leverages his extensive expertise in data analytics, machine learning, and artificial intelligence to drive innovation and educate the wider community. With a strong academic background, including multiple master's degrees in data analytics and business management, Andrew deeply understands the technical intricacies involved in data-driven decision-making. Andrew's career is marked by impactful roles at prominent organizations. He served as the Senior Director of Data Analytics & AI at Arizona State University, where he streamlined analytics processes and significantly enhanced team productivity. At LPL Financial, he played a pivotal role in creating a substantial strategic budget and led enterprise-wide initiatives. His tenure at MassMutual involved leading data projects across diverse teams and countries, managing a significant budget, and leading data privacy initiatives. Andrew's technical ability is further exemplified by his experience at JP Morgan Chase, where he led data analytics, machine learning, and AI projects for Global Wealth Supervision. He spearheaded innovative AI solutions, including a real-time communication monitoring system and predictive models for advisor retention. His ability to automate processes and lead diverse teams of technical experts underscores his leadership and technical acumen. In addition to his technical expertise, Andrew is a seasoned public speaker and educator. He has served as an adjunct instructor and faculty member at various universities, including Trine University, Grand Canyon University, Southern New Hampshire University, Western Governors University, Maryville University, and Indianapolis University. His ability to communicate complex technical concepts in an accessible manner makes him a sought-after speaker and thought leader in the data science community. Andrew's current role as an Evangelist at Dremio allows him to combine his passion for data science with his exceptional communication skills. He actively engages with the wider community, sharing his knowledge and insights to empower organizations to harness the full potential of their data.

Related sessions

View all

Andrew Madson

Related sessions

Utilizing Generative AI to Improve Soft Skills

The role of data analytics and AI in optimizing micromobility operations

Unleashing Gemini: Google Cloud's Generative AI Powerhouse

Register today for move(data)