Nov 29, 2010

Oracle Text Search - Index tuning for handling HTML content

We used Oracle Text Search in our web application for a secure messaging module years ago. Recently, when we added rich text formatting in that module, we discovered that Text Search feature was not working correctly because of the html tags inside the clob content.

So, we had to fix the thing to make that workable again. Basically, we added few parameters while creating the index to handle html properly.


--old
CREATE INDEX "MSG_CONTENT_TEXT_I" ON "MESSAGE_CONTENT" ("CONTENT") INDEXTYPE IS "CTXSYS"."CONTEXT";

--corrected one
CREATE INDEX "MSG_CONTENT_TEXT_I" ON "MESSAGE_CONTENT" ("CONTENT") INDEXTYPE IS "CTXSYS"."CONTEXT" PARAMETERS ('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');


Where,
* NULL_FILTER: No filtering required. Use for indexing plain text, HTML, or XML documents.
* HTML_SECTION_GROUP: Use this group type for indexing HTML documents and for defining sections in HTML documents.

Life is good again!

Here is a good document for reference-
http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/cdatadic.htm

Oct 30, 2010

Investigation of Oracle datafile corruption

We were suspecting some corruption in a data file. Here is a few quick things to be remembered to check the thing quickly and making sure everything was alright -


1. Using  Oracle DBVERIFY

# dbv file=/oradata2/DATAFILES_INDX_20100318/indx01_01.dbf

The output is like -

DBVERIFY - Verification complete

Total Pages Examined         : 281344
Total Pages Processed (Data) : 217134
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 44048
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 11358
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 8804
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 204553469 (0.204553469)


2. Using dictionary view -

SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;


3. Running queries against the data file (tablespace)


I analyze the schema where all the tables were on a single tablespace (and data file) we suspected. If there is any corrupted block, oracle should not be able to read during the process -


execute dbms_stats.gather_schema_stats(ownname => 'ASSISTDB', cascade=>true);


***

Oct 17, 2010

MySQL to Oracle data migration - resolving the mystery of 3 times more space usage in Oracle!

We have been migrating one big MySQL table (about 10 TB) into Oracle 11g R2. The table contains biometric data in 18 BLOB columns - it's a kind of storage table for all biometric information in the form of BLOB. The avg. row length is about 140KB. We had a rough estimation - it might take about 14 TB of disk spaces after migrating the thing into Oracle. We planned to put this table in a 32K bock size tablespace. The table contains about 85 million records.

After we finished loading the data, we discovered that about 30 millions of records already consumed more than 11 TB of disk space! In this fashion, we might end up with consuming 33TB of disk spaces after loading the 85 million of records!

When I came back form a nice vacation, I had been assigned to solve the mystery. Ahh.. the situation was not good! I was kind of clueless at the beginning and looking for right approach to debug the problem.


After playing with some real data I found that, out of 18 BLOBs -

* Some columns are < 4K
* Some of the BLOBs > 4K but < 7K
* Some are > 8K but less than < 14K
* Some are > 20K

Then, I reorganized the table and moved BLOBs in different block size (8K, 16K, 32K) tablespaces based on their data size. This saved space wastage for data which were > 4K. Earlier, entire table was on 32K block size tablespace. Before the reorganization, the size was over 11.5 TB, which is now 6.6 TB! It took about 7 days (nights as well :-) ) to move all LOB columns to proper tablespaces.

Now, the total size of the database would be within 19TB - not 33 TB!


Some important scripts I used -

SELECT l.column_name, s.bytes/1024 KB, s.blocks, s.extents
FROM user_segments s, user_lobs l
WHERE l.segment_name = s.segment_name
AND l.table_name = 'BIOMETRIC';


ALTER TABLE biometric MOVE LOB (wsq) STORE AS (tablespace BMDATA_16K CHUNK 16K pctversion 3);


Good lessons here -

* We should have tested with some real data before the actual data loading

* We should have observed disk usage after a day from initiating the data loading



---

Jul 16, 2010

20 Ways to Screw Up Your Management Career

I had been reading this old one - which most of us realizes (more or less) from a management point of view. Still good to revise :-)

 20 Ways to Screw Up Your Management Career

Apr 4, 2010

Sun Oracle Exadata Version 2

Last week got an opportunity to attend a workshop on Oracle Exadata. About two years ago, when Exadata first released, attended another introductory session presented by Oracle Asia experts. That time I was a bit hopeless with the product as Exadata was specially focused on very big Data Warehouse solutions. I was expecting something that fits with our OLTP + mixed kind of systems and may benefit us.

Now, with the Exadata version 2, it seems that they are probably going to impress everyone in terms of performance and scalability. I found it very exciting this time!

The Exadata version 1 was marketed for Data Warehouse type solutions with very big system - 8 nodes Database Server Grid and 14 nodes Storage Grid (which they are calling now Full Rack). For a small to medium kind of system, this was not perfect and efficient (in terms of money) I believe. After acquiring Sun, Oracle came up with Exadata version 2 with Sun hardware (version 1 was on HP) and emphasis given on OLTP systems as well. More importantly, they came up with different packaging solutions like Half Rack and Quarter Rack which are a kind of thing I was interested in.

The cool features of Exadata are -

* Exadata Query Offloading: Exadata solution offloads data-intensive query processing from Oracle Database servers and does the query processing closer to the data storage. The result is faster parallel data processing and less data movement through higher bandwidth connections. This massively parallel architecture also offers linear scalability.

* Smart Flash Cache: A transparent extension of database buffer cache. Lot more I/O throughput and huge performance improvement. 1 GB costs only $50. We would be able to cache the whole database almost. They say, "This will break random I/O bottleneck by increasing I/O per sec by 20X and doubles user data scan bandwidths".

* Hybrid Columnar Compression (HCC): They say, "Efficient compression increases effective storage capacity and increases user data scan bandwidths by a factor of 10X". Online archiving compression feature will reduce our historical data size about 90%. This will help us to manage old data in one place. No need for separate historical data servers in cheaper disks. Old data partitions will be compressed in online archive mode.

* Exadata Storage Indexes: Eliminate unnecessary I/Os to disk and contributes to the overall performance improvements. These are not actually index, these are kind of filters which intelligently organizes data inside disks. Result is big performance improvement - even with full table scans!


I made a presentation for my other team members to give an overview on Exadata. Here is the summary -

Oracle defines Exadata like this -

* Massively Parallel Grid, the Architecture of the Future

* Extreme Performance: 20X for OLTP and 10X - 100X for data warehouse

* Linear Data Scalability: Same performance for 1 TB and 5 TB of data

* Enterprise Ready: Packaged, readily available. Single support from Oracle for hardware and software

* All Standard: Works transparently with all existing applications

Improvements:

* Millions of transactions per minute
* 10 millions queries per minute
* Billions of rows per minute
* 1 million I/O per second
* 50 GB/sec flash bandwidth
* 21 GB/sec disk bandwidth
* InfiniBand network: 40 Gb per second throughput

Different Exadata Appliances

* Full Rack: 8 Database server and 14 Storage cell server
* Half Rack: 4 Database server and 7 Storage server
* Quarter Rack: 2 Database server and 3 Storage server
* Single Server: 1 Database server and 1 Storage server

Exadata Quarter Rack

* 2 Sun Fire™ X4170 Oracle Database servers
* 3 Exadata Storage Servers (all SAS or all SATA)
* User Data: 6 TB SAS or 21 TB SATA
* 1.1 TB Flash Cache (4 x 96GB PCI Express Flash Cards on each)
* 2 Sun Datacenter InfiniBand Switch, 36-port Managed QDR (40Gb/s) InfiniBand switch
* 1 “Admin” Cisco Ethernet switch
* Keyboard, Video, Mouse (KVM) hardware
* Redundant PDUs
* Single Point of Support from Oracle
* 3 year, 24 x 7, 4 Hr On-site response


Performance Benchmarks
Some of the performance benchmarks shown -

* 1 TB tablespace creation time was about 9 min! On other hardware platforms it will take about 2 hours!
* On full rack, the test query with ‘query offloading’ turned “off” (means simple RAC), took more than 4 min to execute where turning query offloading “on”, took about 5 sec for the same query!
* Another Data loading example shown - 49 GB data loaded from external file in just 1.5 min!


***

Mar 16, 2010

Developer's preliminary checklist before code submission for testing

To impose more responsibility to our developers, made a checklist. This is basically a quick checklist we are suggesting to follow before they push their code for testing. Undoubtedly, this has improved the application quality and reduced some load for testing guys!

Some of these are very specific to our web application, may not be applicable for others -


1. Working features: Make sure all the features, you have just developed, works correctly - one feature did not break other.

2. Field length: Be aware of Database field length for corresponding UI fields. For text fields and text boxes, length must be smaller than Database field length (about 10 characters smaller for every 100 char in db). This is needed for dealing with multi-byte character set.

3. Field validation: Required field and other business validation

4. Required field indication: Must have red asterisk

5. Labels: Labels and messages from Application Resource properties file

6. Spelling: Use correct spelling. No typo.

7. Alignment: Fields and labels should be left aligned.

8. Blank space: No space before 'colon' and 'comma' - use space after comma.

9. Default sort order: For any list, there should be a default sort order

10. URL tempering: Access checking on each page, or something similar to make sure user is allowed to see the page he is requesting.

11. HTML and JavaScript escaping

12. JavaScript error: Check for any JavaScript error in the page.

13. Buttons: Button's position. Proper use of 'Back' and 'Cancel'

14. Date: Proper formatting of date

15. Time Zone: Application of Time Zone where Date is there

16. Activity Tracking: For important actions, ensure 'Activity Tracking' to log user activities

17. Placement of Access Link: Place access links in proper places for users

18. Clear user friendly messages: All confirmation, alert and warning messages

19. Access Control/Privilege Checking: For important business information which depends on user privilege, make sure, list that user sees, are based on their access privileges

20. SQL injection: Use bind variables to avoid SQL injection

21. Double submission problems: What happens when user re-submit the request - handle properly.

22. Invalid Request: Handle bookmarks or other invalid requests

23. Cross browser compatibility: Check for at least two browsers - Firefox 3.5 and IE7


***

Jan 3, 2010

Oracle best practices for programmers

These are the things I suggest developers to follow in developing applications where Oracle database is being used. Some of these I learned from my experience as a developer, my own and others mistakes, and the rest, I learned as a dba!


1. Proper and Consistent Naming:

*  Keywords in UPPER case
*  Provide prefix for table name to group tables (for example according to functionality/module)
*  Provide constraint names, index names
*  Honor the company specific guidelines and best practices

2. Normalization of Database:

*  Normalize up to 4th Normal Form
*  I would say, first design ideally then you may denormalize according to needs

If you don't remember what you learned in university, here is a good link to remember by example -
http://support.microsoft.com/kb/100139

3. Primary/Unique key:

*  Every table should have a primary and/or unique key
*  Helpful for replication product
*  Key data should not be visible
*  Do not make a string field as primary key

4. Bind Variable:

*  Use bind variables instead of literals. This is specially true for all OLTP system or a mixed system (OLTP+heavy reporting) like ours.

For example – 

SELECT * FROM users WHERE id = 1009;

Instead, use this in jdbc template –

SELECT * FROM users WHERE id = ?;

And then pass your bind parameters.

5. Nested Query: Avoid nesting queries.

For example, I would avoid the following types of queries -

SELECT login_name FROM user WHERE prov_id IN
            (SELECT id FROM agency WHERE id IN
                (SELECT prov_id FROM message WHERE ....)


6. Data Segmentation:

*  Add column by which you can group/segment data for big tables

For example – add agency id in user forms although it may not be needed for the application.

7. Bulk Operation in Batch:

* Frequent insert/update/deletes in batch.

8. Data Sorting:

*  Avoid unnecessary ORDER BY
*  Avoid unnecessary DISTINCT

9. Database Call:

*  Do not execute queries inside a LOOP
*  Do not open connections inside a LOOP

10. IN and EXISTS:

*  Use IN if the sub-query has the selective WHERE clause. For example -

SELECT * FROM employee_details ed
WHERE ed.emp_id IN (SELECT id FROM employee WHERE ssn = ‘2990-030-998’) ;

*  If opposite or the parent query contains selective WHERE, use the EXISTS. For example -

SELECT * FROM employee_details  ed
WHERE EXISTS (SELECT e.id FROM employee e WHERE ed.emp_id = e.id
AND e.city=’Dallas’) ; 



 11. Data Access:

*  Avoid frequent visit to table data – Caching could be a good solution
*  select/update values at a single step for multiple columns. For example -

--Non recommended
UPDATE user SET first_name =’John’ WHERE id = 11;
UPDATE user SET last_name =’Doe’ WHERE id = 11;
UPDATE user SET ssn=’9900-8987-009’ WHERE id = 11;

--Recommended
UPDATE user SET first_name=’John’, last_name=’Doe’, ssn=’ 9900-8987-009’ WHERE id = 11;

12. JOIN:


* Use equi JOIN.
* Use LEFT JOIN where you really need to use, otherwise it could be misleading.
* Join in the order that returns least number of rows to the parent step.
*  Perform filtering option early in the inner sql that makes a join. Now a days, Oracle optimizer tries to make the join efficiently, but may not do so efficiently all the time

13. CASE:

* Use CASE for multiple aggregates. For example -

--Non Recommended
SELECT a.name "Agency", PENDING.cnt "Pending Approval", APPROVED.cnt "Approved"
FROM agency a
LEFT JOIN
(SELECT p.id, p.name, g.status, COUNT(distinct g.form_id) cnt
FROM agency p
LEFT JOIN user_forms g ON (g.prov_id=p.id)
WHERE g.event_date >= :param1 and g.event_date -1 < :param2
GROUP BY p.id,p.name, g.status
ORDER BY p.name, g.status) PENDING ON (prov.id = PENDING.id)
LEFT JOIN
(SELECT p.id, p.name, g.status,
COUNT(distinct g.form_id) cnt
FROM agency p
LEFT JOIN user_forms g ON (g.prov_id=p.id)
WHERE g.event_date >= :param1 and g.event_date -1 < :param2
GROUP BY p.id,p.name, g.status
ORDER BY p.name, g.status) APPROVED ON (prov.id = APPROVED.id)
WHERE a.status=1
ORDER BY a.name;


--Recommended
SELECT p.name "Agency", B.PENDING "Pending Approval", B.APPROVED "Approved"
FROM agency A
LEFT JOIN
(SELECT a.id
, COUNT(CASE when  g.status=2 then g.form_id end) "PENDING"
, COUNT(CASE when  g.status=3 then g.form_id end) "APPROVED"
FROM user_forms g, agency a
WHERE g.prov_id=a.id
AND g.event_date >=   TO_DATE('01-may-2009', 'dd-mon-yyyy')
AND g.event_date -1 < TO_DATE('01-aug-2009', 'dd-mon-yyyy')
group by p.id) B ON A.id=B.id
ORDER BY A.name;


14. Use of Function:

*  Be careful of using functions (SUBSTR, TO_CHAR, UPPER etc ) in WHERE/ORDER BY clause –will fail to use an existing index on that column.
*  Create function based index for these cases if really needed.

For example -

WHERE UPPER(first_name) like 'A%'  - this will not use an index on column "first_name" unless it is function based.

15. Indexing: In general –

*  Index all predicates in WHERE clause
*  Join columns
*  Important FKs.
*  For large table (and no heavy updates), for column having low cardinality data, use bitmap index.

For example – “status” (having a range of values 1 to 5) on “form_summary” table is a good candidate for bitmap indexing.

There should have tradeoffs if there are too many insert/updates/deletes. A dba could help more on index tuning.

16. Integer representation for group values:

*  Use number instead of string for a small domain of values. This will be helpful for Bitmap indexing and filter operation.

For example - for different status, use corresponding number representation instead of string values.
Working =1, Submitted=2, Approved=3, Deleted=4

Now queries could be used like –
SELECT * FROM user_forms WHERE status > 1;

17. Keeping summary of child table: 

* In parent table, keeping information about detail table could be a good trick to improve performance.

Table: USER_FORMS (FORM_ID, CONTENT)
Table: ARCHIVE_DATA (FORM_ID, CONTENT)


--Non Recommended
     SELECT uf.form_id, uf.content,
     (SELECT COUNT(*)  FROM archive_data WHERE form_id = ‘AXB-IU78-Y6TRQ’) AS cnt
     FROM user_forms uf
    WHERE uf.form_id = ‘AXB-IU78-Y6TRQ’;
 

--Recommended
Table: USER_FORMS (FORM_ID, CONTENT,  ARCHIVE_COUNT) // add an extra column which will keep archive count.

      SELECT uf.form_id, uf.content, uf.archive_count
      FROM user_forms uf
     WHERE uf.form_id = ‘AXB-IU78-Y6TRQ’;

18. Avoid OR and IN clauses:

*  Too many OR clauses in queries could be an indication of bad data model! Try to find out alternatives – now!

These are what I remember at the moment.

***

Jan 1, 2010

Flashback Query - the life saver!

Flashback features are really a very useful thing if we have such a situation. Depending on the undo retention settings, we could travel past in time!

One manager was looking for an old snapshot of records before it was being modified by the user. This is how I was able to retrieve the thing -

* Created a temp table with the data just before the unwanted changes -

CREATE TABLE tmp_users AS
(SELECT * FROM users
AS OF TIMESTAMP TO_TIMESTAMP('27-DEC-09 12:18:00','DD-MON-YY HH24: MI: SS')
WHERE user_id = '7EF3D2SP');

* There is another cool thing is 10g and onwards - one could flashback a whole table to some point in time.

FLASHBACK TABLE users TO TIMESTAMP TO_TIMESTAMP('27-DEC-09 12:18:00','DD-MON-YY HH24: MI: SS');

* Even, if a table is accidentally dropped, nothing to worry about :-)

FLASHBACK TABLE users TO BEFORE DROP;


I know, there is another cool thing called Flashback Database to rewind the whole database to some period in past! This is much simpler than the traditional point in time recovery with archived redo logs - not posting anything on that today.

***