Jan 3, 2010

Oracle best practices for programmers

These are the things I suggest developers to follow in developing applications where Oracle database is being used. Some of these I learned from my experience as a developer, my own and others mistakes, and the rest, I learned as a dba!


1. Proper and Consistent Naming:

*  Keywords in UPPER case
*  Provide prefix for table name to group tables (for example according to functionality/module)
*  Provide constraint names, index names
*  Honor the company specific guidelines and best practices

2. Normalization of Database:

*  Normalize up to 4th Normal Form
*  I would say, first design ideally then you may denormalize according to needs

If you don't remember what you learned in university, here is a good link to remember by example -
http://support.microsoft.com/kb/100139

3. Primary/Unique key:

*  Every table should have a primary and/or unique key
*  Helpful for replication product
*  Key data should not be visible
*  Do not make a string field as primary key

4. Bind Variable:

*  Use bind variables instead of literals. This is specially true for all OLTP system or a mixed system (OLTP+heavy reporting) like ours.

For example – 

SELECT * FROM users WHERE id = 1009;

Instead, use this in jdbc template –

SELECT * FROM users WHERE id = ?;

And then pass your bind parameters.

5. Nested Query: Avoid nesting queries.

For example, I would avoid the following types of queries -

SELECT login_name FROM user WHERE prov_id IN
            (SELECT id FROM agency WHERE id IN
                (SELECT prov_id FROM message WHERE ....)


6. Data Segmentation:

*  Add column by which you can group/segment data for big tables

For example – add agency id in user forms although it may not be needed for the application.

7. Bulk Operation in Batch:

* Frequent insert/update/deletes in batch.

8. Data Sorting:

*  Avoid unnecessary ORDER BY
*  Avoid unnecessary DISTINCT

9. Database Call:

*  Do not execute queries inside a LOOP
*  Do not open connections inside a LOOP

10. IN and EXISTS:

*  Use IN if the sub-query has the selective WHERE clause. For example -

SELECT * FROM employee_details ed
WHERE ed.emp_id IN (SELECT id FROM employee WHERE ssn = ‘2990-030-998’) ;

*  If opposite or the parent query contains selective WHERE, use the EXISTS. For example -

SELECT * FROM employee_details  ed
WHERE EXISTS (SELECT e.id FROM employee e WHERE ed.emp_id = e.id
AND e.city=’Dallas’) ; 



 11. Data Access:

*  Avoid frequent visit to table data – Caching could be a good solution
*  select/update values at a single step for multiple columns. For example -

--Non recommended
UPDATE user SET first_name =’John’ WHERE id = 11;
UPDATE user SET last_name =’Doe’ WHERE id = 11;
UPDATE user SET ssn=’9900-8987-009’ WHERE id = 11;

--Recommended
UPDATE user SET first_name=’John’, last_name=’Doe’, ssn=’ 9900-8987-009’ WHERE id = 11;

12. JOIN:


* Use equi JOIN.
* Use LEFT JOIN where you really need to use, otherwise it could be misleading.
* Join in the order that returns least number of rows to the parent step.
*  Perform filtering option early in the inner sql that makes a join. Now a days, Oracle optimizer tries to make the join efficiently, but may not do so efficiently all the time

13. CASE:

* Use CASE for multiple aggregates. For example -

--Non Recommended
SELECT a.name "Agency", PENDING.cnt "Pending Approval", APPROVED.cnt "Approved"
FROM agency a
LEFT JOIN
(SELECT p.id, p.name, g.status, COUNT(distinct g.form_id) cnt
FROM agency p
LEFT JOIN user_forms g ON (g.prov_id=p.id)
WHERE g.event_date >= :param1 and g.event_date -1 < :param2
GROUP BY p.id,p.name, g.status
ORDER BY p.name, g.status) PENDING ON (prov.id = PENDING.id)
LEFT JOIN
(SELECT p.id, p.name, g.status,
COUNT(distinct g.form_id) cnt
FROM agency p
LEFT JOIN user_forms g ON (g.prov_id=p.id)
WHERE g.event_date >= :param1 and g.event_date -1 < :param2
GROUP BY p.id,p.name, g.status
ORDER BY p.name, g.status) APPROVED ON (prov.id = APPROVED.id)
WHERE a.status=1
ORDER BY a.name;


--Recommended
SELECT p.name "Agency", B.PENDING "Pending Approval", B.APPROVED "Approved"
FROM agency A
LEFT JOIN
(SELECT a.id
, COUNT(CASE when  g.status=2 then g.form_id end) "PENDING"
, COUNT(CASE when  g.status=3 then g.form_id end) "APPROVED"
FROM user_forms g, agency a
WHERE g.prov_id=a.id
AND g.event_date >=   TO_DATE('01-may-2009', 'dd-mon-yyyy')
AND g.event_date -1 < TO_DATE('01-aug-2009', 'dd-mon-yyyy')
group by p.id) B ON A.id=B.id
ORDER BY A.name;


14. Use of Function:

*  Be careful of using functions (SUBSTR, TO_CHAR, UPPER etc ) in WHERE/ORDER BY clause –will fail to use an existing index on that column.
*  Create function based index for these cases if really needed.

For example -

WHERE UPPER(first_name) like 'A%'  - this will not use an index on column "first_name" unless it is function based.

15. Indexing: In general –

*  Index all predicates in WHERE clause
*  Join columns
*  Important FKs.
*  For large table (and no heavy updates), for column having low cardinality data, use bitmap index.

For example – “status” (having a range of values 1 to 5) on “form_summary” table is a good candidate for bitmap indexing.

There should have tradeoffs if there are too many insert/updates/deletes. A dba could help more on index tuning.

16. Integer representation for group values:

*  Use number instead of string for a small domain of values. This will be helpful for Bitmap indexing and filter operation.

For example - for different status, use corresponding number representation instead of string values.
Working =1, Submitted=2, Approved=3, Deleted=4

Now queries could be used like –
SELECT * FROM user_forms WHERE status > 1;

17. Keeping summary of child table: 

* In parent table, keeping information about detail table could be a good trick to improve performance.

Table: USER_FORMS (FORM_ID, CONTENT)
Table: ARCHIVE_DATA (FORM_ID, CONTENT)


--Non Recommended
     SELECT uf.form_id, uf.content,
     (SELECT COUNT(*)  FROM archive_data WHERE form_id = ‘AXB-IU78-Y6TRQ’) AS cnt
     FROM user_forms uf
    WHERE uf.form_id = ‘AXB-IU78-Y6TRQ’;
 

--Recommended
Table: USER_FORMS (FORM_ID, CONTENT,  ARCHIVE_COUNT) // add an extra column which will keep archive count.

      SELECT uf.form_id, uf.content, uf.archive_count
      FROM user_forms uf
     WHERE uf.form_id = ‘AXB-IU78-Y6TRQ’;

18. Avoid OR and IN clauses:

*  Too many OR clauses in queries could be an indication of bad data model! Try to find out alternatives – now!

These are what I remember at the moment.

***

1 comment:

Anonymous said...

Your blog keeps getting better and better! Your older articles are not as good as newer ones you have a lot more creativity and originality now keep it up!