• Sessions appear in the color of their primary track
  • Sessions can be filtered using Products on the right
  • Use the Search bar for more flexibility
See this link for hints on how to search the schedule or sign up for sessions
Back To Schedule
Tuesday, March 20 • 11:15am - 12:05pm
Turning Relational Database Tables into Spark Data Sources
Feedback form is now closed.
Big Data Analytics requires unstructured data (a.k.a. Big Data) but also Master data generally stored in RDBMS tables.

Apache Spark furnishes the Spark SQL interface for querying unstructured data using SQL. This interface provisions the Datasource API with a JDBC interface for accessing relational databases. How to join RDBMS tables with Big data using Spark SQL without moving data over?

This session describes an implementation of the Spark Datasource API and explains the optimizations that (i) allow parallel and direct access to the RDBMS database (with the option of controlling the number of concurrent connections); (ii) introspect the RDBMS table, generate partitions of Spark JDBCRDDs based on the split pattern and rewrite Spark SQL queries into the RDBMS SQL; (iii) use hooks in the JDBC driver for faster type conversions; (iv) push down predicates to the RDBMS, prune partitions based on the where clause to reduce the amount of data returned to the RDBMS.

avatar for Kuassi Mensah

Kuassi Mensah

Director Product Management, Oracle Corporation
Kuassi is Director of Product Management at Oracle. He looks after the following product areas (i) Java connectivity to DB (Cloud, on-premises), in-place processing with DB embedded JVM (ii) MicroServices and DB connectivity, and related topics (Data & Tx models, Kubernetes, SAGAs... Read More →

Tuesday March 20, 2018 11:15am - 12:05pm PDT
5-Rm 105