Aug 10, 2009

Oracle SQL Loader: high speed data loading tool

Oracle SQL*Loader is a utility provided by Oracle to load bulk data into database tables. It's an integral feature of the databases, available in all configurations.

We had a situation where we need to build new tables with lots of records calculated from other tables. We simplified our access control structure and that needed significant migration effort. Initially we tried to write java programs which inserted records to the new tables. But it were taking too much time (about 2 hours to load some 80 million records) to load data into the new tables. We could not afford this amount of downtime since we have to be up and running on 24/7 basis. So, we had to device a way to minimize the downtime.

We tried to generate csv files using java programs which we had to load into database ultimately to build the new access control tables. Our new approach worked like a silver bullet! We were finally able to get all these done in just few minutes. The java program generated flat csv files (values in comma separated) in about 4 minutes and we loaded the csv data into tables using sql loader in just few seconds!

It is so fast - I loaded over 80 million data in just 25 sec!!

Three things needed in most cases for loading data using sql loader -

a) The control file
b) The external data file (might be a csv file)
c) The command to initiate loading

Here are the samples of those -

a) Control file: summary.ctl

UNRECOVERABLE LOAD DATA
INFILE 'summary.csv'
APPEND INTO TABLE tac_form_summary
fields terminated by "," optionally enclosed by '"'
(
UFID INTEGER EXTERNAL,
FORM_TYPE INTEGER EXTERNAL,
PROV_ID INTEGER EXTERNAL,
PGM_ID INTEGER EXTERNAL,
CLIENT_ID INTEGER EXTERNAL,
CREATED_BY_ID INTEGER EXTERNAL,
CREATED_DATE "to_timestamp(:CREATED_DATE, 'YYYY-MM-DD HH24:MI:SS.FF3')",
NOTIF_LEVEL INTEGER EXTERNAL,
TEST_MODE INTEGER EXTERNAL,
FORM_ID CHAR,
TITLE CHAR,
SUMMARY CHAR
)

b) Data file: summary.csv

2,1,100,199,0,892,"2005-05-26 09:42:52.000",2,0,"XEBC","Comm Log","Type: [N]"

c) The command - how to run

From command line, issue the command -

#sqlldr user/pass control=summary.ctl PARALLE=true DIRECT=true

Also, changing the default settings, you could try -

#sqlldr user/pass control=summary.ctl BINDSIZE=20971520 ROWS=10000 READSIZE=20971520


Loading options: Conventional vs Direct Path

With conventional loading (if DIRECT=TRUE not mentioned), SQL*Loader loads data into a bind array and passes it to the database engine to process with an INSERT statement and full UNDO and REDO will be generated for this.

With direct path loading, SQL*Loader reads data, pass the data to the database via direct path API. The API calls format it directly into data blocks in memory and then flushes these blocks directly to the datafiles using multi block I/O. Bypassing the buffer cache, UNDO and REDO makes it super fast!

Other setting which helps to improve performance in conventional loading -

* Default settings -

Bind array: 64 rows, maximum of 256000 bytes
Space allocated for bind array: 198144 bytes(64 rows)
Read buffer bytes: 1048576

Elapsed time was: 00:01:10.58
CPU time was: 00:00:03.58

* Improved settings -

Bind array: 10000 rows, maximum of 20971520 bytes
value used for ROWS parameter changed from 10000 to 6773
Space allocated for bind array: 20969208 bytes(6773 rows)
Read buffer bytes:20971520

Elapsed time was: 00:00:11.26
CPU time was: 00:00:03.98


Ref: Simple examples are shown here http://www.orafaq.com/wiki/SQL*Loader_FAQ

Aug 8, 2009

Managing TEMP Tablespace

In Oracle database, temporary tablespace (also known as temp segment) and UNDO tablespace (also known as UNDO segment) are two very important things.

As I decided to write important stuffs time to time I do with database, it's time to write few important things on TEMP tablespace. Today, I had to remember TEMP tablespace related stuffs again - one of my friends was facing problem with BLOB updates for long running transactions in his system.

There is no database guy who never saw this error message - ORA-1652: unable to extend temp segment by .. this was the case with my friend. He is not a database guy - so here I go. There was not enough temp space to support his long running transactions which were storing LOB data. So I made that auto extendable.



What is TEMP Tablespace?


It is a special tablespace, needed for oracle database to store temporary data/segments to support other operations. One database can have at most one active temp tablespace at a time, but of course temp tablespace can have one or more data files like other tablespaces.

Temporary segments are created in temp tablespace during -

* Large sort operation, when PGA_AGGREGATE_TARGET can't accommodate with the size. These include ORDER BY, GROUP BY, DISTINCT, MERGE JOIN, HASH JOIN or CREATE INDEX.

* When temporary table is used

* To store temporary LOB data

How to increase temp size

SQL> SELECT tablespace_name, file_name, bytes/1024/1024 MB, AUTOEXTENSIBLE
FROM dba_temp_files WHERE tablespace_name = 'TEMP';

TABLESPACE_NAME FILE_NAME MB AUT
----------------- -------------------------------- ------------ ------------
TEMP /data/orcl/temp01.dbf 100 NO


SQL> ALTER DATABASE TEMPFILE '/data/orcl/temp01.dbf' RESIZE 500M;

If the file is not auto extensible, if we have enough disk space, we could set auto extend on and max size unlimited for that.

SQL> ALTER DATABASE TEMPFILE '/data/orcl/temp01.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED;


How to shrink/reclaim space

There might be cases, one long running query or lob segment might increase the temp tablespace size significantly but we don't want to keep temp tablespace strict to that size as we may not need such large file in general.

The idea is to create another temporary temp tablespace (say temp2) and temporarily assign it to the database. Finally recreate the desired temp tablespace and reassign it and drop the temp things.


SQL> CREATE TEMPORARY TABLESPACE TEMP2
TEMPFILE '/data/orcl/temp2.dbf' SIZE 5M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED;


SQL> ALTER DATABASE DEFAULT TEMPORARY TABLESPACE TEMP2;


SQL> DROP TABLESPACE TEMP INCLUDING CONTENTS AND DATAFILES;


SQL> CREATE TEMPORARY TABLESPACE TEMP
TEMPFILE '/data/orcl/temp01.dbf' SIZE 100M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 10M;

SQL> ALTER DATABASE DEFAULT TEMPORARY TABLESPACE TEMP;

SQL> DROP TABLESPACE TEMP2 INCLUDING CONTENTS AND DATAFILES;

If database is not accessible by users, then we could do it in the following shortcut way -

SQL> ALTER DATABASE TEMPFILE '/data/orcl/temp01.dbf' DROP INCLUDING DATAFILES;


SQL> ALTER TABLESPACE TEMP ADD TEMPFILE '/data/orcl/temp01.dbf' SIZE 500M
AUTOEXTEND ON NEXT 250M MAXSIZE UNLIMITED;


Monitoring TEMP segments

I use this script to identify 'who' is eating up temp space and why -

set linesize 200
col sid_serial format a10
col sql_text format a40
col username format a10
col program format a25


SELECT sq.sql_text, ROUND(((b.blocks*p.value)/1024/1024),2)||'MB' "SIZE",
a.sid||','||a.serial# SID_SERIAL,
a.username,
a.program
FROM sys.v_$session a,
sys.v_$sort_usage b,
sys.v_$parameter p,
v$sql sq
WHERE a.saddr = b.session_addr
AND b.sql_id = sq.sql_id
AND p.name = 'db_block_size'
AND b.tablespace = 'TEMP'
ORDER BY b.tablespace, b.blocks;

Aug 7, 2009

Understanding jdbc connection pool settings in WebLogic

Few days ago, I was looking at database and found that there were about 30-40 connection/sessions established per minute on average. This should not be the usual case with connection pooling in place.

Then looked at database listener.log to see which web server was requesting connections and at what rate. Found a pattern, after about each 10 min, there is a flood of connection requests from all the web servers. Now it's time to look at connection pool setting - what are there!

Weblogic Connection pool Settings

< jdbc-connection-pool-params >
< initial-capacity >10< /initial-capacity >
< max-capacity >150< /max-capacity >
< capacity-increment>10< /capacity-increment>
< shrink-frequency-seconds >600< /shrink-frequency-seconds >
< highest-num-waiters >300< /highest-num-waiters>
< connection-creation-retry-frequency-seconds >3< /connection-creation-retry-frequency-seconds >
< connection-reserve-timeout-seconds >30< /connection-reserve-timeout-seconds >
< test-connections-on-reserve >true< /test-connections-on-reserve >
< inactive-connection-timeout-seconds>1800< /inactive-connection-timeout-seconds>
< test-table-name>SQL SELECT 1 FROM DUAL< /test-table-name>
< /jdbc-connection-pool-params >


We set connection shrink time to 600 sec that means 10 min.

What does "ShrinkFrequencySeconds" parameter mean?

"The number of seconds to wait before shrinking a connection pool that has incrementally increased to meet demand. ShrinkingEnabled must be set to true for a connection pool to shrink."

So, after 10 min, connections were returned to database. But just after that, probably a bunch of http requests came into play with the web servers. Most of the case, it needs to establish connection to the database to serve the request. Since these requests were more than the initial connection pool value which is 10 (back to the initial connection numbers after shrinking), jdbc establishes a bunch of new connections again to serve those requests. This is how we had a flood of new connection request after about 10 min on avg.


Here we can find the description of all the connection pool parameters -

http://e-docs.bea.com/wls/docs81/config_xml/JDBCConnectionPool.html

Aug 6, 2009

Hibernate second level caching to reduce query executions

There are situations in web applications where some queries could be executed in almost every http request. If those queries return static kind of data, a good approach is caching to reduce database execution calls. Reduced execution means reduced CPU usage. So, related to database performance in anyway.

Recently, we have discovered that one of the static kind of query is quite frequently parsed and executed in database. The parsing had been increased as we experimentally disabled "Statement Caching" in WebLogic connection pool.

AWR output: 11:30 AM - 12:00 PM; Parse Calls in 30 min

Parse CallsExecutions % Total Parses SQL IdSQL ModuleSQL Text
68,185 68,187 20.38 dxyfqwzsn9727
select title0_.id as id27_0_, ...


The above query returns kind of static information, less frequently changed. The query has been generated from a Hibernate call on that db object. So, we could easily introduce a second level caching for that - it is something that caches individual values, not the instances of an object.

Here is a good article on Hibernate caching -
http://www.javalobby.org/java/forums/t48846.html

What we need to do to enable second level caching?

What I just did -

1. Added 'Cache' annotation to the db class (Title.java). We are using Hibernate 3, so there is no hbm.xml now.
2. Now for the corresponding db class, I added caching configurations in ehcache.xml. Ehcache is a widely used java distributed cache for general purpose caching for Java EE applications.


#Title.java (the db class)

@Entity
@Table(name = "title")
@SequenceGenerator(name = "Seq_Title",sequenceName = "title_seq")
@Cache(usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE)
public class Title extends Persistent {

private int id;

.................
.................

}

#ehcache.xml (ehcache configuration file)

< ehcache >
.................
.................


< name="net.service.common.db.Title"
maxelementsinmemory="1000"
eternal="false"
overflowtodisk="false"
timetoidleseconds="900"
timetoliveseconds="900" / >



< / ehcache >


To test if it works or not, I took one test database box and flushed shared pool to remove old stats. Then connecting to the database, ran the application which requires large invocation of the query.

It is working according to my expectation, just loaded new titles (by invoking the query) after each 15 min!

col sql_text format a40;
set linesize 200;

select sq.SQL_TEXT, st.PARSE_CALLS, st.EXECUTIONS,
round( 100*(1-( st.PARSE_CALLS / st.EXECUTIONS )),2) execute_to_parse
from v$sqlstats st, v$sql sq
where st.sql_id = sq.sql_id
and sq.PARSING_SCHEMA_NAME = 'PROD'
and sq.sql_text like 'select title0_.id%'
and st.EXECUTIONS <> 0
order by execute_to_parse;

SQL_TEXT     PARSE_CALLS EXECUTIONS EXECUTE_TO_PARSE
---------------------------------------- ----------- ---------- -------
select title0_.id as id27_0_, title0_.cr     3   140   97.86
eated as created27_0_, title0_.updated a
s updated27_0_, title0_.version as versi
on27_0_, title0_.description as descript
5_27_0_, title0_.prov_id as prov6_27_0_,
title0_.title as title27_0_ from title
title0_ where title0_.id=:1


3 parse calls, 140 executions in 20 min!! I was requesting to load many titles, but it was actually serving from the cache. So, there will less executions for the query from now on :-). Also less parse calls since we are using "Prepared Statement Caching" in WebLogic connection pool.

Aug 5, 2009

Low 'Execute to Parse' ratio analysis

Suddenly, got a call from one of the Unix team members, one of our side applications was not responding properly.

I looked at database if something wrong going on there. Found something very interesting -

* Queries were taking longer time than previous (longer execution time than needed before 2 weeks)

* More CPU were being used.

* Execute to Parse ratio went down to 3%-5%. Earlier, we had this ration about 52%-65%. This could be an indication of a major problem.

* Got some stats from AWR - high number of sessions were being established during that problem hours. About 30 connections/sessions established per minute. Our normal range was 7-10 connections per minute. We do use connection pooling in Weblogic.

AWR Report showed -

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %: 100.00Redo NoWait %: 100.00
Buffer Hit %: 99.53In-memory Sort %: 100.00
Library Hit %: 99.64Soft Parse %: 99.83
Execute to Parse %: 4.18Latch Hit %: 99.98
Parse CPU to Parse Elapsd %: 91.48% Non-Parse CPU: 99.42


The 'Execute to Parse' thing drew my attention first.

What is Execute to Parse ratio? What does it tell?

Every query needs to be parsed before it got executed. If some queries are quite frequently used, a good design will be reuse the parsed query. That means, it will not parse the same query again (provided that we are using bind variables), rather will reuse the existing parsed query and then execute it. In an ideal world, execute should be more than the parse. More parsing requires more CPU. That's why the concept of query reuse came in to play.

When execute is more than parse, the figure will be positive. The ratio goes down if there are more parses than the execute. Some of the queries are parsed, but less or never executed!

What may be the possible reasons for such low ratio?

This "Execute to Parse ratio" actually tells many things and seeks many areas to look into for further information. Causes could be one of the followings or something else -

* There might be no "Prepared Statement caching" in java layer or in jdbc connection pool
* There might be the case, before the execution of a query, the cursor was closed
* There might not be enough "session cached cursors" to hold the queries to reuse
* There might be the case where new queries were coming to play quite frequently
* There might be connection interruptions

I have to find out the queries those were being parsed but executed less during the problem hours -

set linesize 200;
set pagesize 1000;
col sql_text format a40;

SELECT sq.sql_text, st.executions_total, st.parse_calls_total
, round( 100*(1-( st.parse_calls_total / st.executions_total )),2) execute_to_parse
, st.executions_delta, st.parse_calls_delta
, round( 100*(1-( st.parse_calls_delta / st.executions_delta )),2) delta_ratio
FROM DBA_HIST_SQLSTAT st
, DBA_HIST_SQLTEXT sq
, DBA_HIST_SNAPSHOT s
WHERE s.snap_id = st.snap_id
AND s.begin_interval_time >= to_date('2009-08-04 11:30 am','YYYY-MM-DD HH:MI am')
AND s.end_interval_time <= to_date('2009-08-04 12:01 pm','YYYY-MM-DD HH:MI pm') AND st.sql_id = sq.sql_id AND st.parsing_schema_name='PROD' AND st.executions_total !=0 AND st.executions_delta !=0 ORDER BY delta_ratio;


The delta ratio is the current for the snapshot. So, I have a list of queries those parsed but less or not executed - I can now look at application code for a better understanding.


Tough time ahead!

Aug 3, 2009

Understanding Oracle Histograms

Before I forget, want to write a few words as I did some experiments recently with the histogram statistics and our complex queries.

Let me do it as a question-answer type approach.

What is Histogram in Oracle?

Histogram is an option for database statistics collection (introduced in 10g). It contains information about how the data of a table column is distributed. Histogram helps Oracle optimizer to determine whether certain values occur frequently, rarely or not at all. And based on those information, it can select an optimum access path for the query. For a big database, I found histogram stats are really really important.

If there are no histograms, it is assumed that the column values are equally distributed over the tables.


What does it do? What's the use?

I am aware of the two direct benefits -

* Histogram helps the optimizer to improve table join order: For example, if we have several table joins will million of rows whose final result set will be only few rows, Oracle tries to join tables together in such an order to make the result set cardinality (rows returned) of the first joins as small as possible.This has a huge performance impact on performing the rest of the operations since will it have to deal with less amount of rows in subsequent operations.


* Histogram helps the optimizer to find optimum access path - full table scan vs accessing via index: For example, there is an index on "user_type" (assume that we have 5% admins, 15% managers, 80% support type users). If we are looking for most of the support type users and the query returns 60% of the table rows, it will be more efficient to do a full table scan rather than accessing the 'support' users via "user_type_i" index. Histogram keeps those statistics to make a choice.


What are different types of Histogram?

There are three - two basic types + None.

* Height-balanced histograms: In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall.

If there are more different values in the column than the amount of buckets, a height balanced histogram is created. The column values are divided into intervals of the same size. In this case, an interval is assigned to each bucket.

The maximum number of histogram buckets is 254.

* Frequency histograms: For this type, each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of that single value. Frequency histograms are when the number of distinct values is less than or equal to the number of histogram buckets specified.

If there are fewer different values in the column than the amount of buckets, this type of histogram is created. Each bucket contains one column value and the frequency of this column value.

* None, meaning no histogram statistics for the column


Two important views to find information about the histograms in database -

- USER_TAB_COL_STATISTICS
- USER_TAB_HISTOGRAMS


In our big system, for over 2,000 queries, I have found that oracle automatically collected histogram stats for some columns -

SQL> select count(column_name)
from user_tab_col_statistics
where histogram <> 'NONE';

COUNT(COLUMN_NAME)
------------------
1247

Some queries run much faster without any histogram information. For those, I collected statistics using the following options -

SQL> exec dbms_stats.gather_table_stats(ownname=>'PROD'
,tabname=> 'USER_INFO'
,method_opt=> 'FOR ALL COLUMN SIZE 1'
,estimate_percent=>50, cascade=>true);


Also, there are good number of queries those performs better if histogram stats available. For those kind of queries, I collected stats letting oracle decide the bucket size specifying 'AUTO' -


SQL> exec dbms_stats.gather_table_stats(ownname=>'PROD'
,tabname=> 'USER_INFO'
,method_opt=> 'FOR ALL COLUMN SIZE AUTO'
,estimate_percent=>50, cascade=>true);


Statistics are vital for database performance. If there are problem queries, one should analyze the execution plans throughly and tryout the statistics options suitable for those - case by case.