Loading…
  • Sessions appear in the color of their primary track
  • Sessions can be filtered using Products on the right
  • Use the Search bar for more flexibility
See this link for hints on how to search the schedule or sign up for sessions

View analytic
Tuesday, March 20 • 11:15am - 12:05pm
Turning Relational Database Tables into Spark Data Sources

Log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Big Data Analytics requires unstructured data (a.k.a. Big Data) but also Master data generally stored in RDBMS tables.

Apache Spark furnishes the Spark SQL interface for querying unstructured data using SQL. This interface provisions the Datasource API with a JDBC interface for accessing relational databases. How to join RDBMS tables with Big data using Spark SQL without moving data over?

This session describes an implementation of the Spark Datasource API and explains the optimizations that (i) allow parallel and direct access to the RDBMS database (with the option of controlling the number of concurrent connections); (ii) introspect the RDBMS table, generate partitions of Spark JDBCRDDs based on the split pattern and rewrite Spark SQL queries into the RDBMS SQL; (iii) use hooks in the JDBC driver for faster type conversions; (iv) push down predicates to the RDBMS, prune partitions based on the where clause to reduce the amount of data returned to the RDBMS.

Speakers
avatar for Kuassi Mensah

Kuassi Mensah

Director Product Management, Oracle
Kuassi Mensah is Director of Product Management at Oracle; his scope includes:(i) Java performance, scalability, HA, and Security with Oracle database.(ii) Hadoop and Spark integration with the Oracle database (iii) Java & JavaScript integration with the Oracle database (OJVM, Na... Read More →


Tuesday March 20, 2018 11:15am - 12:05pm
5-Rm 105

Twitter Feed