Aug 10, 2009

Oracle SQL Loader: high speed data loading tool

Oracle SQL*Loader is a utility provided by Oracle to load bulk data into database tables. It's an integral feature of the databases, available in all configurations.

We had a situation where we need to build new tables with lots of records calculated from other tables. We simplified our access control structure and that needed significant migration effort. Initially we tried to write java programs which inserted records to the new tables. But it were taking too much time (about 2 hours to load some 80 million records) to load data into the new tables. We could not afford this amount of downtime since we have to be up and running on 24/7 basis. So, we had to device a way to minimize the downtime.

We tried to generate csv files using java programs which we had to load into database ultimately to build the new access control tables. Our new approach worked like a silver bullet! We were finally able to get all these done in just few minutes. The java program generated flat csv files (values in comma separated) in about 4 minutes and we loaded the csv data into tables using sql loader in just few seconds!

It is so fast - I loaded over 80 million data in just 25 sec!!

Three things needed in most cases for loading data using sql loader -

a) The control file
b) The external data file (might be a csv file)
c) The command to initiate loading

Here are the samples of those -

a) Control file: summary.ctl

UNRECOVERABLE LOAD DATA
INFILE 'summary.csv'
APPEND INTO TABLE tac_form_summary
fields terminated by "," optionally enclosed by '"'
(
UFID INTEGER EXTERNAL,
FORM_TYPE INTEGER EXTERNAL,
PROV_ID INTEGER EXTERNAL,
PGM_ID INTEGER EXTERNAL,
CLIENT_ID INTEGER EXTERNAL,
CREATED_BY_ID INTEGER EXTERNAL,
CREATED_DATE "to_timestamp(:CREATED_DATE, 'YYYY-MM-DD HH24:MI:SS.FF3')",
NOTIF_LEVEL INTEGER EXTERNAL,
TEST_MODE INTEGER EXTERNAL,
FORM_ID CHAR,
TITLE CHAR,
SUMMARY CHAR
)

b) Data file: summary.csv

2,1,100,199,0,892,"2005-05-26 09:42:52.000",2,0,"XEBC","Comm Log","Type: [N]"

c) The command - how to run

From command line, issue the command -

#sqlldr user/pass control=summary.ctl PARALLE=true DIRECT=true

Also, changing the default settings, you could try -

#sqlldr user/pass control=summary.ctl BINDSIZE=20971520 ROWS=10000 READSIZE=20971520


Loading options: Conventional vs Direct Path

With conventional loading (if DIRECT=TRUE not mentioned), SQL*Loader loads data into a bind array and passes it to the database engine to process with an INSERT statement and full UNDO and REDO will be generated for this.

With direct path loading, SQL*Loader reads data, pass the data to the database via direct path API. The API calls format it directly into data blocks in memory and then flushes these blocks directly to the datafiles using multi block I/O. Bypassing the buffer cache, UNDO and REDO makes it super fast!

Other setting which helps to improve performance in conventional loading -

* Default settings -

Bind array: 64 rows, maximum of 256000 bytes
Space allocated for bind array: 198144 bytes(64 rows)
Read buffer bytes: 1048576

Elapsed time was: 00:01:10.58
CPU time was: 00:00:03.58

* Improved settings -

Bind array: 10000 rows, maximum of 20971520 bytes
value used for ROWS parameter changed from 10000 to 6773
Space allocated for bind array: 20969208 bytes(6773 rows)
Read buffer bytes:20971520

Elapsed time was: 00:00:11.26
CPU time was: 00:00:03.98


Ref: Simple examples are shown here http://www.orafaq.com/wiki/SQL*Loader_FAQ

No comments: