Talk

Data VS Engineers: a long lasting story

Thursday, May 23

16:20 - 16:50
RoomFocaccia
LanguageEnglish
Audience levelIntermediate
Elevator pitch

Large companies often face the same challenges: fragmented data, incompatible formats, and scattered expertise. These “data silos” create chaos, hindering your ability to build robust models.​

In this talk, we’ll unveil our solution – a Data Feature Store as Single Source of Truth that overcomes data silos and ensures feature consistency.

Abstract

In this talk, we’ll share our experience building and deploying Efesto, a scalable Data Feature Store. This Python and PySpark software solution empowers data scientists and engineers to focus on solving real business problems by eliminating the need for developers to “reinvent the wheel” with repetitive feature engineering tasks. We’ll explore how Efesto leverages Apache Airflow’s orchestration capabilities to manage dependencies and ensure data quality for hundreds of deployed data features.​

Here’s a deeper dive into what we’ll cover:​

  • Technical Deep Dive: We’ll explore the design and architecture of Efesto, focusing on how we created a Python library for building data features.​

  • Airflow as the Standard: We’ll showcase how we built an Airflow-based toolkit to standardize process execution across all teams.​

  • Unify and Conquer: Discover strategies for unifying data across silos and enforcing feature consistency within your organization’s data pipelines.​

  • Real-World Results: Witness concrete examples of how Efesto empowers data science teams to build better models faster, improve collaboration, and unlock the full potential of their data.

TagsAnalytics, Scaling, Data Structures
Participant

Andrea Purgato

Andrea, a Data Engineer with a strong foundation in software engineering, leads the team managing the Generali Data Platform built on GCP. Having begun his career as a Java Developer, Andrea transitioned his focus to data, leveraging his technical skills to effectively work with and manage it.​

He holds a Master’s degree in Computer Science Engineering from Politecnico di Milano and pursued a double degree program in Computer Science at the University of Illinois at Chicago.​

Outside of work, Andrea enjoys spending time outdoors tending to his garden, where he cultivates fresh, local produce (KM0 food). He’s also a passionate supporter of his favorite football team, AC Milan.​

Participant

Federica Previ

Federica is a senior data engineer with six years of experience and a Bachelor’s degree in Computer Science from the University of Milan. In her role at Generali Italia CDO’s Data Platform team, she focuses on developing advanced frameworks to automate and standardize data pipeline creation, prioritizing security and reliability. Her expertise lies in crafting efficient solutions that enhance data integrity across the organization.