Description
In this session, speaker will dive deep into Delta Sharing; A Linux Foundation open source solution for sharing massive amounts of data in a cheap, secure, and scalable way.
Delta Sharing reliably accesses data at the bandwidth of modern cloud object stores, such as S3, ADLS, or GCS. The data provider runs a sharing server and decides what data to share. To get you started, a hosted reference sharing service, an open-sourced pre-packaged server, and a Docker image are available for sharing data from your lakehouse.
Under the hood, Delta Sharing uses an open REST protocol, enabling secure data sharing across products and companies for the first time.
Any client supporting pandas, Apache Spark™, or Python, can connect to the sharing server. Clients always read the latest version of the data, and they can provide filters on partitioned data to read a subset of the data.
This talk is built around a number of hands-on demos: We start with a multi-cloud example using Google Colab. Then speaker will share some raw data of the sampled DNA using Delta Sharing and we will build a client in pandas. The client will then check for genetic traits, such as eye color, the coffee metabolism rate, special nutritional requirements etc. All data access is read-only, there will be no harm to the presenter. To conclude, we will compare running your own self-hosted Delta Sharing server with sharing data from a managed cloud service using SQL.
Local ODSC chapter in Berlin, Germany
Instructor's Bio
Dr. Frank Munz
Developer Advocate at Databricks
Dr. Frank Munz authored three computer science books, built up technical evangelism for Amazon Web Services in Germany, Austria and Switzerland and once upon a time worked as data scientist with a group that won a Nobel prize for linking HPV to cancer.
Frank realized his dream to speak at top-notch conferences on every continent (except antarctica, because it is too cold there) such as re:Invent, Devoxx, Kubecon, and Java One. He holds a PhD in Computer Science from TU Munich.
Webinar
-
1
ON-DEMAND WEBINAR: Sharing Large Amounts of Data with Open Source Delta Sharing
-
Ai+ Training
-
Webinar recording
-
Join ODSC West 2021 Training Conference
-
UPCOMING LIVE TRAINING
Register now to save 30%
-
All Courses, All Live Training
PAST LIVE TRAINING: Available On-Demand: Gradient Boosting for Prediction and Inference
2 Lessons $189.00 -
All Courses, All Live Training
PAST LIVE TRAINING: Available On-Demand: Data Visualization with Seaborn
3 Lessons $147.00 -
All Courses, All Live Training
PAST LIVE TRAINING: Available On-Demand: Building Machine Learning Pipelines for Retraining
2 Lessons $189.00