Loading tables into a SQL database from a diverse database system is common in the BI world. Not every client is using SQL server and I had a few challenges when trying to bulk load tables from a PostGres database.
There are expensive OLEDB native providers out there but here’s a way to keep the cost low.
You will need to visit http://www.postgresql.org to grab the ODBC driver for PostgreSQL and set it up using PostgreSQL Unicode.
Connecting to Postgres
Once you have installed the driver you need to setup the ODBC data source connection to the Postgres database.
- Control Panel > Administrative Tools > Data Sources (ODBC).
- Click the ‘System DSN’ tab and click the Add button.
All details should be provided from the Postgres DBA if that is not yourself.
I have read that SQL server linked server aren’t the best to use for this as I had problems myself (explained later) but this did eventually work with the help of SSIS.
Set up your linked server using ‘Microsoft OLE DB provider of ODBC Drivers’ with the Data source being the name of the source from your ODBC connection. Tip: linked server also uses a provider settings and this can be found in the ‘providers’ subfolder for extra options
You will then have access to the Postgres database. This will be found in a subfolder of the Catalogs e.g.
...\Server Objects\ Linked Servers\ [Postgres linked server name]\ Catalogs\ [Postgres Database]
You will get a list of views and tables which you can use an OPENQUERY statement from and this will support PostgreSQL.
SELECT * FROM OPENQUERY (PG_LINKEDSERVER, ‘SELECT * FROM [Postgres table name] limit 1;’)
Tip: I used the PostgreSQL ‘LIMIT’ this does the same as T-SQL ‘TOP’ expression.
If the tables in your Postgres database are fairly small you may never experience this.
I ran an OPENQUERY statement to count a large table in the Postgres database and it returned this error .
“Msg 7357, Level 16, State 2, Line 1Cannot process the object “SELECT COUNT(*) FROM [name of large table];”. The OLE DB provider “MSDASQL” for linked server “PG_LINKEDSERVER” indicates that either the object has no columns or the current user does not have permissions on that object.“
Well I knew this was untrue as I could run the same query on the Postgres version of management studio PGAdmin.
To add to the further frustration I then couldn’t run any query after this as the linked server timed out for about 3 minutes and didn’t realise that at first.
I believe that this is due to the size of this table and it writing a full insert to Temp DB so this is where linked servers are slightly flawed.
I could do SELECT or COUNT statements but only on tables fewer than 425,000 records and anything over will cause the linked server to fail.
But it’s not the end of the world if you want to load large tables into your SQL Server database as you can use the OFFSET and LIMIT arguments to the SELECT clause (see below).
Ideally I wanted to run a simple
SELECT * into table from OPENQUERY (PG_LINKEDSERVER ….
But timeouts prevents me from doing this so setting up an OLE DB task in SSIS allows me to run my openquery.
I used a ‘For loop container’ that will run a select statement to insert all the records from Postgres database.
Now your wondering how you would be able to loop through and on the round trip obtain the correct next set of records.
PostgreSQL allows you to use the OFFSET statement e.g.
SELECT * FROM OPENQUERY(PG_LINKEDSERVER , ‘SELECT FROM [table_name] order by [column_id]OFFSET 50000 LIMIT 50000 ;’)
A variable of SQL statement needs to be set up to allow the offset to dynamically update the round trip to get the correct record each time.
Once this is all running correctly you should find this as a quick and efficient way to load the tables you require.
Tips – Data types
You may have to tweak the data types columns from Postgres as sometimes SQL Server gets this wrong and applies an incorrect datatype against a column.
SSIS also handles varchar4000 and varcharMAX different so look at changing some datatypes to either varcharMax and in some cases Nvarchar.
I’m still awaiting an answer around the timeout issue in SQL Server so any comments would be helpful but for now this works for me and gives me the control of a quick load using linked server and SSIS.