Oct 30, 2011

Monitoring long running SQLs

When we moved to new box after upgrading the h/w - we had been seeing some of the queries taking more than normal time to finish up. Sometimes, those were ended with a blocking status.

When the ops team noticed high jdbc usage and informed me, that was too late. The customers were suffering much earlier than we get noticed. The first incident happened around 5 pm, jdbc usage shown up high on "cacti" and "Nagios" a bit later, and I got call around 5:30 pm (my time 3:30 am) , by that time the system was almost bogged down.

Anyways, to get an early notification, I write a shell script and put it in cron job which will run the script in every 5 minutes. This will check long running queries (running for more than three minutes) and mail to ops team so that we can take precaution and prevent from any possible system failure.

Here is the SQL script I used in my shell script -

set linesize 200
set pagesize 200
col "SQL Text" format a40


SELECT a.username "User", a.sid "SID", a.serial# "Serial"
, b.sql_text "SQL Text", a.last_call_et "Elapsed Time"
FROM v$session a, v$sql b
WHERE a.sql_id = b.sql_id
AND a.status='ACTIVE'
AND a.LAST_CALL_ET >= 180
AND a.USERNAME IN ('PROTS','STEE');


***

Disabling an application feature with database trigger

Interesting things have been happening around but I did not write down here for quite sometime.

I remember the incident - I got call around 3 am from the ops team when they noticed very high jdbc connection usage. I found that, one of our delete feature taking too much time in db and also causing others to be blocked.

This 'delete' is not such an important feature and we had been thinking to get rid of this permanent delete from database. We had been thinking not to delete any user permanently from database because, we are no longer running a small system. We have FK reference over thousand places in database. A permanent delete checks all the FK references in child tables, anyways. This takes a lot of time even though FK references were indexed. Till then, unfortunately we could never find a suitable window to do the release with the fix. The alternatives was - changing the user status to "deleted" by a flag.

Anyways, the flag change approach requires some programming time and a release schedule, which was not going to happen in a day or two even if we had been locking by this delete that night. User complaining slowness...

I thought, as a quick fix, I can put a trigger and prevent user from using that feature - I mean it's possible to stop user from deleting the records from that table.


CREATE OR REPLACE TRIGGER PREVENT_USER_DELETE
BEFORE DELETE
ON app_user
FOR EACH ROW
BEGIN
RAISE_APPLICATION_ERROR(-20005,'User delete is temporarily unavailable');
END;
/


This simple trigger was a life saver that night. Few users who tried to delete application user permanently, they received application error page though, but the db never got locked in this cascading effect. This bought us some time to improve the feature of permanent delete - changing status flag.

***

Oct 12, 2011

Flashback database feature speed up migration script testing

Usually people use flashback database technique to recover fast from data loss or corruption. This is much faster approach than the traditional point-in-time recovery where redo and archive logs are being used.

We did find it useful that helped in testing migration scripts during big application release. We had to do this on real data. There were huge data changes/manipulations in several tables on different schemas. After migration, if anything discovered wrong later in testing phase, we had to start the whole thing with a fresh schema. The schema loading time was huge to start over where flashback technique just rewind the whole database in just 30 min. 

Here are few quick steps to configure flashback database feature -

1. Setting initialization parameters -

alter system set db_recovery_file_dest_size=100G;
alter system set db_recovery_file_dest='/oradata/flashback';
alter system set db_flashback_retention_target=
4320;

2. DB in archive log mode -

select log_mode,flashback_on from v$database;
alter database archivelog;


3. Enabling flashback logging -

shutdown immediate
startup mount
alter database flashback on;
alter database open;


4. Checking status -

select flashback_on from v$database;
select * from v$flashback_database_log;

 
5. Creating restore point before running migration scripts -

create restore point before_migration_91;
select systimestamp from dual;


6. Performing migration/DML scripts, java programs etc.

When needed, we re-winded the database (to a fresh copy) using the following command -

We can lookup the already created restore points like below -

SELECT NAME, SCN, TIME, DATABASE_INCARNATION#,
GUARANTEE_FLASHBACK_DATABASE,STORAGE_SIZE
FROM V$RESTORE_POINT;

Putting database in mount state - 

flashback database to restore point before_migration_91;
alter database open resetlogs;
 
The commands below could be used as well - 

flashback database to timestamp to_timestamp('22-JUN-11 10:30:00','DD-MON-YY HH24: MI: SS');
flashback
database to scn scn_no;


That made the whole migration testing a lot easy and fast if we had to start over from the begging with fresh schemas.
***