Jan 3, 2010

Oracle best practices for programmers

These are the things I suggest developers to follow in developing applications where Oracle database is being used. Some of these I learned from my experience as a developer, my own and others mistakes, and the rest, I learned as a dba!


1. Proper and Consistent Naming:

*  Keywords in UPPER case
*  Provide prefix for table name to group tables (for example according to functionality/module)
*  Provide constraint names, index names
*  Honor the company specific guidelines and best practices

2. Normalization of Database:

*  Normalize up to 4th Normal Form
*  I would say, first design ideally then you may denormalize according to needs

If you don't remember what you learned in university, here is a good link to remember by example -
http://support.microsoft.com/kb/100139

3. Primary/Unique key:

*  Every table should have a primary and/or unique key
*  Helpful for replication product
*  Key data should not be visible
*  Do not make a string field as primary key

4. Bind Variable:

*  Use bind variables instead of literals. This is specially true for all OLTP system or a mixed system (OLTP+heavy reporting) like ours.

For example – 

SELECT * FROM users WHERE id = 1009;

Instead, use this in jdbc template –

SELECT * FROM users WHERE id = ?;

And then pass your bind parameters.

5. Nested Query: Avoid nesting queries.

For example, I would avoid the following types of queries -

SELECT login_name FROM user WHERE prov_id IN
            (SELECT id FROM agency WHERE id IN
                (SELECT prov_id FROM message WHERE ....)


6. Data Segmentation:

*  Add column by which you can group/segment data for big tables

For example – add agency id in user forms although it may not be needed for the application.

7. Bulk Operation in Batch:

* Frequent insert/update/deletes in batch.

8. Data Sorting:

*  Avoid unnecessary ORDER BY
*  Avoid unnecessary DISTINCT

9. Database Call:

*  Do not execute queries inside a LOOP
*  Do not open connections inside a LOOP

10. IN and EXISTS:

*  Use IN if the sub-query has the selective WHERE clause. For example -

SELECT * FROM employee_details ed
WHERE ed.emp_id IN (SELECT id FROM employee WHERE ssn = ‘2990-030-998’) ;

*  If opposite or the parent query contains selective WHERE, use the EXISTS. For example -

SELECT * FROM employee_details  ed
WHERE EXISTS (SELECT e.id FROM employee e WHERE ed.emp_id = e.id
AND e.city=’Dallas’) ; 



 11. Data Access:

*  Avoid frequent visit to table data – Caching could be a good solution
*  select/update values at a single step for multiple columns. For example -

--Non recommended
UPDATE user SET first_name =’John’ WHERE id = 11;
UPDATE user SET last_name =’Doe’ WHERE id = 11;
UPDATE user SET ssn=’9900-8987-009’ WHERE id = 11;

--Recommended
UPDATE user SET first_name=’John’, last_name=’Doe’, ssn=’ 9900-8987-009’ WHERE id = 11;

12. JOIN:


* Use equi JOIN.
* Use LEFT JOIN where you really need to use, otherwise it could be misleading.
* Join in the order that returns least number of rows to the parent step.
*  Perform filtering option early in the inner sql that makes a join. Now a days, Oracle optimizer tries to make the join efficiently, but may not do so efficiently all the time

13. CASE:

* Use CASE for multiple aggregates. For example -

--Non Recommended
SELECT a.name "Agency", PENDING.cnt "Pending Approval", APPROVED.cnt "Approved"
FROM agency a
LEFT JOIN
(SELECT p.id, p.name, g.status, COUNT(distinct g.form_id) cnt
FROM agency p
LEFT JOIN user_forms g ON (g.prov_id=p.id)
WHERE g.event_date >= :param1 and g.event_date -1 < :param2
GROUP BY p.id,p.name, g.status
ORDER BY p.name, g.status) PENDING ON (prov.id = PENDING.id)
LEFT JOIN
(SELECT p.id, p.name, g.status,
COUNT(distinct g.form_id) cnt
FROM agency p
LEFT JOIN user_forms g ON (g.prov_id=p.id)
WHERE g.event_date >= :param1 and g.event_date -1 < :param2
GROUP BY p.id,p.name, g.status
ORDER BY p.name, g.status) APPROVED ON (prov.id = APPROVED.id)
WHERE a.status=1
ORDER BY a.name;


--Recommended
SELECT p.name "Agency", B.PENDING "Pending Approval", B.APPROVED "Approved"
FROM agency A
LEFT JOIN
(SELECT a.id
, COUNT(CASE when  g.status=2 then g.form_id end) "PENDING"
, COUNT(CASE when  g.status=3 then g.form_id end) "APPROVED"
FROM user_forms g, agency a
WHERE g.prov_id=a.id
AND g.event_date >=   TO_DATE('01-may-2009', 'dd-mon-yyyy')
AND g.event_date -1 < TO_DATE('01-aug-2009', 'dd-mon-yyyy')
group by p.id) B ON A.id=B.id
ORDER BY A.name;


14. Use of Function:

*  Be careful of using functions (SUBSTR, TO_CHAR, UPPER etc ) in WHERE/ORDER BY clause –will fail to use an existing index on that column.
*  Create function based index for these cases if really needed.

For example -

WHERE UPPER(first_name) like 'A%'  - this will not use an index on column "first_name" unless it is function based.

15. Indexing: In general –

*  Index all predicates in WHERE clause
*  Join columns
*  Important FKs.
*  For large table (and no heavy updates), for column having low cardinality data, use bitmap index.

For example – “status” (having a range of values 1 to 5) on “form_summary” table is a good candidate for bitmap indexing.

There should have tradeoffs if there are too many insert/updates/deletes. A dba could help more on index tuning.

16. Integer representation for group values:

*  Use number instead of string for a small domain of values. This will be helpful for Bitmap indexing and filter operation.

For example - for different status, use corresponding number representation instead of string values.
Working =1, Submitted=2, Approved=3, Deleted=4

Now queries could be used like –
SELECT * FROM user_forms WHERE status > 1;

17. Keeping summary of child table: 

* In parent table, keeping information about detail table could be a good trick to improve performance.

Table: USER_FORMS (FORM_ID, CONTENT)
Table: ARCHIVE_DATA (FORM_ID, CONTENT)


--Non Recommended
     SELECT uf.form_id, uf.content,
     (SELECT COUNT(*)  FROM archive_data WHERE form_id = ‘AXB-IU78-Y6TRQ’) AS cnt
     FROM user_forms uf
    WHERE uf.form_id = ‘AXB-IU78-Y6TRQ’;
 

--Recommended
Table: USER_FORMS (FORM_ID, CONTENT,  ARCHIVE_COUNT) // add an extra column which will keep archive count.

      SELECT uf.form_id, uf.content, uf.archive_count
      FROM user_forms uf
     WHERE uf.form_id = ‘AXB-IU78-Y6TRQ’;

18. Avoid OR and IN clauses:

*  Too many OR clauses in queries could be an indication of bad data model! Try to find out alternatives – now!

These are what I remember at the moment.

***

Jan 1, 2010

Flashback Query - the life saver!

Flashback features are really a very useful thing if we have such a situation. Depending on the undo retention settings, we could travel past in time!

One manager was looking for an old snapshot of records before it was being modified by the user. This is how I was able to retrieve the thing -

* Created a temp table with the data just before the unwanted changes -

CREATE TABLE tmp_users AS
(SELECT * FROM users
AS OF TIMESTAMP TO_TIMESTAMP('27-DEC-09 12:18:00','DD-MON-YY HH24: MI: SS')
WHERE user_id = '7EF3D2SP');

* There is another cool thing is 10g and onwards - one could flashback a whole table to some point in time.

FLASHBACK TABLE users TO TIMESTAMP TO_TIMESTAMP('27-DEC-09 12:18:00','DD-MON-YY HH24: MI: SS');

* Even, if a table is accidentally dropped, nothing to worry about :-)

FLASHBACK TABLE users TO BEFORE DROP;


I know, there is another cool thing called Flashback Database to rewind the whole database to some period in past! This is much simpler than the traditional point in time recovery with archived redo logs - not posting anything on that today.

***