Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers Pat Patterson
Transcription
Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers Pat Patterson
Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers Pat Patterson Principal Developer Evangelist [email protected] @metadaddy Agenda Foreign Data Wrappers Writing FDW’s in C Multicorn Database.com FDW for PostgreSQL FDW in action Why Foreign Data Wrappers? External data sources look like local tables! – Other SQL database • MySQL, Oracle, SQL Server, etc – NoSQL database • CouchDB, Redis, etc – File – LDAP – Web services • Twitter! Why Foreign Data Wrappers? Make the database do the work – SELECT syntax • DISTINCT, ORDER BY etc – Functions • COUNT(), MIN(), MAX() etc – JOIN external data to internal tables – Use standard apps, libraries for data analysis, reporting Foreign Data Wrappers 2003 - SQL Management of External Data (SQL/MED) 2011 – PostgreSQL 9.1 implementation – Read-only – SELECT-clause optimization – WHERE-clause push-down • Minimize data requested from external source Future Improvements – JOIN push-down • Where two foreign tables are in the same server – Support cursors FDW’s in PostgreSQL ‘Compiled language’ (C) interface Implement a set of callbacks typedef struct FdwRoutine { NodeTag type; /* These functions are required. */ GetForeignRelSize_function GetForeignRelSize; GetForeignPaths_function GetForeignPaths; GetForeignPlan_function GetForeignPlan; ExplainForeignScan_function ExplainForeignScan; BeginForeignScan_function BeginForeignScan; IterateForeignScan_function IterateForeignScan; ReScanForeignScan_function ReScanForeignScan; EndForeignScan_function EndForeignScan; /* These functions are optional. */ AnalyzeForeignTable_function AnalyzeForeignTable; } FdwRoutine; FDW’s in PostgreSQL Much work! • CouchDB FDW • https://github.com/ZhengYang/couchdb_fdw/ • couchdb_fdw.c > 1700 LoC Multicorn http://multicorn.org/ PostgreSQL 9.1+ extension Python framework for FDW’s Implement two methods… Multicorn from multicorn import ForeignDataWrapper class ConstantForeignDataWrapper(ForeignDataWrapper): def __init__(self, options, columns): super(ConstantForeignDataWrapper, self).__init__(options, columns) self.columns = columns def execute(self, quals, columns): for index in range(20): line = {} for column_name in self.columns: line[column_name] = '%s %s' % (column_name, index) yield line Database.com FDW for PostgreSQL OAuth login to Database.com / Force.com – Refresh on token expiry Force.com REST API – SOQL query • SELECT firstname, lastname FROM Contact Request thread puts records in Queue, execute() method gets them from Queue JSON parsing – skip embedded metadat < 250 lines code Demo Conclusion Foreign Data Wrappers make the whole world look like tables! Writing FDW’s in C is hard! – Or, at least, time consuming! Writing FDW’s in Python via Multicorn is easy! – Or, at least, quick! Try it for yourself! Resources http://wiki.postgresql.org/wiki/SQL/MED http://wiki.postgresql.org/wiki/Foreign_data_wrappers http://multicorn.org/ https://github.com/metadaddy-sfdc/Database.comFDW-for-PostgreSQL