Mahmud's blog

Desired to do the new things, the new way...

August 2009 - Posts

Query XL file from SqlServer 2005 management studio using distributed query

Backgroud:

Number of times in my life I worked in dataloading from XL file to sql server database. Whenever you get this type of work in hand, the first thing you need to do is analyzing the data of the xl file to check 1. How many rows already exists in the database 2. How many rows are new 3. How many rows has corrupted data 4. How many rows do not satisfy the business rule etc. There could be diffrent check points in different scenaio. To check a big XL file manually is almost impossible. So, what I used to do is, I exported those data to a database table and then query to it. Lot of times I had to struggle and face hassle to do this DTS based export from XL to table. I always felt, if I could have a way to directly query the xl file, that would make the life lot easier.

Querying XL file from SQLServer:

This time when I have got yet another XL file for data upgrade and synchronisation purpose I decided to find the way to query the xl file directly from management studio.

There are basically 2 ways I found to do this. One option is to use linked server and another one is to use distributed query. Linked server is aprropriate if you need this from your applicaion frequently. but if you just need to query the xl file for data analyzing, distrbuted query is the best option. Here is the way to do it:

1. You need to enable Ad Hoc Distributed Queries option by using sql server surface area configuration or by using the following query:

     sp_configure 'show advanced options', 1

     reconfigure

     sp_configure 'Ad Hoc Distributed Queries', 1

     reconfigure

2. Now here is the query:

select * from OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0','Data Source=E:\Public_Folder\Users.xls;Extended Properties=Excel 8.0')...[sheet1$]

3. You can join this to any table of your database to analyze the data as follows:

SELECT

                xl.serial

FROM

                Contact con

INNER JOIN

                users u

ON

                u.contact_id = con.contact_id

INNER JOIN

                OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0','Data Source=E:\Public_Folder\Users.xls;Extended Properties=Excel 8.0')...sheet1$ xl

ON

xl.Last_Name = con.Last_name

AND xl.First_Name = con.First_name

So, happy dataload from now on J

 

 

Posted: 08-06-2009 10:13 AM by Mahmud | with no comments
Filed under: