Sep 19, 2009

Database performance problem - some common areas to look at!

Where there is a head, there is headache - similarly, if there is a database, there are performance issues! I accepted the truth long ago :-)

I experienced performance problems several times with our ever growing systems and the busy database which is also evolving at this stage.

Over the years, I have seen that, there are few areas which causes the performance problems most of the time. I thought I would write few things on that, for my reference specially. Those common areas are probably -

1. Problem queries: Most of the time, I have seen, some queries were eating up the system, taking significant amount of database time and resources. Tuning those top 3/4 queries, solves the problem.

2. Outdated/missing statistics: Up to date statistics are very vital for Oracle cost based optimizer. Stale statistics might cause database to crawl and lead to death!

3. Wait events: Waits are big threat for database health. There could be Row lock waits, Network waits, I/O waits, Concurrency waits and so on that could hold the resources and lead to a death situation at the end. We need to figure out how database is spending most of its time.

4. Full scans for big tables/missing indexes: There could be cases, queries are fetching too many rows by full tables scans and there are huge "consistent gets" causing the use of more cpu cycles. It might be the case that we missed an index or need to repartition tables.

5. Hard parsing: Developers are not from this planet (specially freshers), most of the time they don't think that there is a database. Developers could easily forget to use bind variables and that's why database is experiencing hard parses (look at hard parse %). So, this area should not be overlooked.

6. Bad data model: Rigid normalize model and too many 'OR' clauses in logic might also slow down systems. We should be careful about what we are doing and why.

7. Poor application/process design:
The application design and flows could be bad which causes unnecessary query executions and database access. Reviewing those areas and correcting things helps a lot.

For example, due to poor process flows, there might be cases that users are bound to execute queries, say goes to dashboard (where dashboard contains a lot of queries) after each save operation - which could be eliminated. Also, we may execute static kind of queries (the queries those return same result most of the time) quite frequently which are good candidate for caching, and thins like that.

8. Poor Connection Management: Frequent connection establishment has a negative impact on database. This process is a resource intensive one. Efficient and correct use of connection pooling is very important for a stateless application system.

9. No data pruning: If we don't have partitioning and/or data pruning mechanism in the system, and deal with huge number of records grew over the years - this is not a good practice. Dealing with big tables without partitioning has always been a bad technique and causes performance problems for select queries, for index updates, unique constraints and maintenance.

Anything else - don't remember right at the moment!

5 comments:

Fahd Mirza said...

Thanks for the tip, dear.

So how much big is the table, when it should be a candidate for the partitioning?

Another thing I would add that first thing in performance tuning exercise should be to think as where the database is spending it time.

Omar Faruq said...

Yeh, the time spent in database are very important - that's what tried to point out in 'Wait Events' :-)


About the partitioning rule - good question, my opinion is, it depends on systems and business model primarily -

* Response time requirements - how fast the queries on those big tables

* Disk management policies - how much disk are being used by the table data

* The business model - what is definition of current data (3 months/6 months/ 1 year... etc)

You know, there could be tables having 10 millions data but still taking a small disk space (2/3 columns of integer data types). Also it might be the case that queries are very straight-forward and accessing through index is fast enough - in that case, I may not be interested for partitioning.

What I usually do, if table grows more than 4/5 million rows and/or queries on those tables take more than 200 ms response time (we have set this benchmark) and/or if the disk size of table is more than 4/5 GB, I consider it for further analysis and testing with partitioning - of course understanding the business model first.

You may have other better experiences on these things.

Thanks.

Fahd Mirza said...

Well, I agree there with you that decision making regarding partitioning varies from shop to shop (as is the case with many other database options).

At my site, I invariable use partitioning for the datawarehosue, and only look for partitioning candidates on OLTP, if the queries take time more than 500 msec and the table size is more than 10 million.

Pardon my asking questions upon questions, but I have got couple more:

1- Do you have your own reactive performance tuning strategy?
For example, when you deploy a new database, what pro-active measures you take at database level w.rt. performance tuning?

and as the database is in operation and a user comes running to you and says, "Database is responding Slow." How do you proceed about that complaint?

Thanks and best regards

Omar Faruq said...

Okay, I am trying to address your concerns -

1. Yes, most of the time, for my application and database, the tuning is reactive. I am a big fan of the first optimization rule which is - "Do not optimize!" :-)

Our user is growing each day. It's hard to predict exact system behavior for web applications we develop and for the agile and sprint models we are following.

What I do for a new copy of database (I mean, the replica of an existing system) is - I try to configure it with same settings, keeping the system parameters (Oracle & OS) identical and also ensure same kind of statistics.

2. Precisely, I try to find out the answers of the points I mentioned in my post. For this, I use a couple of things at initial stage of investigation -

* On unix, find out which process takes most memory and cpu.

* Pull AWR and ASH reports for the problem hours and look at a couple of things like - load profile, instance efficiency, top timed events, top wait classes, queries that took most cpu and elapsed time, most executed queries, most parsed queries, top database objects and so on. Sometimes I run my own queries as needed.

* I may look at different Oracle parameters based on the outcomes of the above things.

* If needed, I also look at application logs to analyze response times by url/module and to find any other important clues. I was a developer and still involve in system architecture like things, and I know the domains/requirements, so it makes sense for me. This could be a nightmare for other DBA - I would say.

I believe tuning is an art - sometimes it's really hard to tell in advance what I may like to do :-)

You may have other views on that.

Fahd Mirza said...

Thanks for the valuable feedback, I have listed down an outline of the approach I follow w.r.t Oracle performance Tuning here:

http://fahdmirza.blogspot.com/2009/09/oracle-performance-tuning-approach.html

Will appreciate your feedback.

regards