Description

In this session, speaker will dive deep into Delta Sharing; A Linux Foundation open source solution for sharing massive amounts of data in a cheap, secure, and scalable way.

Delta Sharing reliably accesses data at the bandwidth of modern cloud object stores, such as S3, ADLS, or GCS. The data provider runs a sharing server and decides what data to share. To get you started, a hosted reference sharing service, an open-sourced pre-packaged server, and a Docker image are available for sharing data from your lakehouse.

Under the hood, Delta Sharing uses an open REST protocol, enabling secure data sharing across products and companies for the first time.

Any client supporting pandas, Apache Spark™, or Python, can connect to the sharing server. Clients always read the latest version of the data, and they can provide filters on partitioned data to read a subset of the data.

This talk is built around a number of hands-on demos: We start with a multi-cloud example using Google Colab. Then speaker will share some raw data of the sampled DNA using Delta Sharing and we will build a client in pandas. The client will then check for genetic traits, such as eye color, the coffee metabolism rate, special nutritional requirements etc. All data access is read-only, there will be no harm to the presenter. To conclude, we will compare running your own self-hosted Delta Sharing server with sharing data from a managed cloud service using SQL.


Local ODSC chapter in Berlin, Germany


Instructor's Bio

Dr. Frank Munz

 Developer Advocate at Databricks

Dr. Frank Munz authored three computer science books, built up technical evangelism for Amazon Web Services in Germany, Austria and Switzerland and once upon a time worked as data scientist with a group that won a Nobel prize for linking HPV to cancer.

Frank realized his dream to speak at top-notch conferences on every continent (except antarctica, because it is too cold there) such as re:Invent, Devoxx, Kubecon, and Java One. He holds a PhD in Computer Science from TU Munich. 

Webinar

  • 1

    ON-DEMAND WEBINAR: Sharing Large Amounts of Data with Open Source Delta Sharing

    • Ai+ Training

    • Webinar recording

    • Join ODSC West 2021 Training Conference