recent

Titulo

Converting Excel data into a Flat file for Loading

I WISH, the requester sends the data file in the format we want and then the load is done in few minutes. Hey, it does not hurt to ask the requester back in a format that you need to make your load process easier but you may or may not get what you have requested. The most frequently used file type in most organization today is Excel document.There are various issues with loading data from Excel as most database may not support the functionality directly where you may need to use some third party utility. Most database loads data from a flat file with fields separated by comma. We will discuss on converting excel file into a file that is database load ready and the problem we may encounter the csv file.

Converting Excel into CSV (Comma Separated Value): This is an easy process which anyone can do it with few clicks. If you don't, you just need to save your excel document file "save as type: CVS(Comma Delimited)(*.csv). The new file *.csv file is created which you can open using your favorite text editor like notepad++ to view it as comma separated file. You can run your loader against this flat file to load your data.

Excel:
CSV: (Opened with Notepad++)
Using SQL Loader in Oracle to load data and the data load was successfully without any rejects and the life is GOOD.

Life may not be good always which we all know that. Recently, I was doing a data load file from Excel document where I followed the instruction exactly what was stated above. Only 2855 out of 3000 loaded successfully using SQL loader. What was wrong? To find out, I reviewed the bad.log files and further investigate the data that didn't load. The data that didn't load had comma on some field. When you have comma within data and the delimiter is comma as well, the data do not align with the columns and the database load will throw error for Invalid data type or data too long. Let's go over an example of data with comma and how we tackle this problem?

Excel:
CSV:
Do you see the problem here? We have 5 columns and the data file has eight data for 5 columns. Always, the number of columns should match the number of commas -1 in the data file. If not there is a problem of not loading or loading data into wrong column name.

Resolution: Change the comma delimiter into pipe delimiter file. When I tried to replace the comma with pipe in notpad++, it replace all comma which was no good. There are two
solutions I came across and use the one you like the most. The idea is to convert .CSV file into pipe delimited or use a special character not used within a data file.

Solution 1:
 Open .CSV file and save it as Tab delimiter File.
 Open tab delimited file using notepad++ and replace tab with pipe
 Save it.

There are 5 columns and the data file has 5 data as well. How do you like that?
Solution 2: 
  1. Make sure Excel is closed
  2. Navigate to control panel
  3. Select ‘Region and Language’
  4. Click the ‘Additional Settings’ button
  5. Find the List separator and change it from a comma to your preferred delimiter such as a pipe (|).
  6. Click OK
  7. Click OK
  8. Exit Control panel
  9. Open the Excel file you want to export to a pipe delimited file
  10. Select File, Save As
  11. Change the ‘Save as type’ to ‘CSV (Comma delimited)(*.csv)’
  12. Change the name and file extension if you want, by default stays as csv even though a different delimiter
  13. Click Save
  14. Click OK
  15. Click Yes

It produces the same result as the above. This solution #2 is what I found online when I wasn't convinced with my solution #1 and I realized there got to be a better way and I found Barry Stevens Solution and it works well.

Excel documents are not the best file type for loading data therefore, I would always request file into a flat file which your loader loves if not I would always convert into pipe delimited file NOT comma separated value. You will run into the issue like I did and sometime you may not notice where the load is successful but the data loaded incorrectly. Always check that data file and ensure that it does not contain the delimiter that you are going to use in your SQLLDR or other loader utility that your DBMS supports. Comma is most likely to be used in a data than pipe, check and verify before using a Delimiter. Do not use .csv, use Pipe separated value or use something that isn't used in a file.

Interested in working with me? I can be reached at pbaniya04[at]gmail.com for any questions, consulting opportunities or you may drop a line to say HELLO. Thank your again for visiting my blog and looking forward to serving you more.

Have a Database-ious Day!

No comments

Powered by Blogger.