Apr 13, 2008

Handling BIG Database Table "UserActivity"

Let's start with the stats -

We have over 70 million records in user_activity table now. As the name implies, user_activity keeps record of users' actions in applications.

The growths for year 2007 were like (in million) -

1st quarter: 6
2nd quarter: 8.5
3rd quarter: 10.5
4th quarter: 12

These huge number of records were giving a lot more pain than expected. SharePlex tool can not sync this big table between databases. This table is occupying 1/3 size of the whole database. Handling dump requires extra time and space. We could go for oracle partitioning but it will not serve our purpose here.

I thought to handle it by ourselves. I decided to to split user_activity table into a number of tables according to date range. For example, all 2004 data would reside in one table, 2005 data would go to another table, the same for 2006 data as well. I am keeping these yearly data together as we had less amount of data during those years. Then I split data quarterly from 2007 and onwards. Again, this is not fixed, in 2009 when we would have 100K system users, we might need to split monthly :)

Now, user_activity table is divided into a number of tables like -

UA_200401_200412
UA_200501_200512
UA_200601_200612
UA_200701_200703
UA_200704_200706
UA_200707_200709
UA_200710_200712
USER_ACTIVITY

After each quarter of 2008, at some time, I will run the dynamic partitioning script which will split user_activity again and will create a new table. For example, the first quarter table will be UA_200801_200803.

We have programming challenges here. What if when user select a date range where we need to fetch data from multiple tables and how do we formulate a clean logic for that?

Okay, we need some generic mechanism so that the code remains clean and scalable. I created a table USER_ACTIVITY_SPLIT_INFO which provides the necessary table(s) information from where user activity data will be looked up after breaking the original table. This table contains information like follows -









TABLE_NAMEBEGIN_DATEEND_DATE
UA_200401_20041225-OCT-03 02.23.22.000000 PM31-DEC-04 03.50.56.000000 PM
UA_200501_20051201-JAN-05 05.09.38.000000 AM31-DEC-05 11.58.52.000000 PM
UA_200601_20061201-JAN-06 12.01.23.000000 AM31-DEC-06 11.59.55.956000 PM
UA_200701_20070301-JAN-07 12.00.27.530000 AM31-MAR-07 11.59.59.344000 PM
UA_200704_20070601-APR-07 12.00.00.786000 AM30-JUN-07 11.59.58.917000 PM
UA_200707_20070901-JUL-07 12.00.00.055000 AM30-SEP-07 11.59.57.929000 PM
UA_200710_20071201-OCT-07 12.00.00.811000 AM31-DEC-07 11.59.57.893000 PM
USER_ACTIVITY01-JAN-08 12.00.02.140000 AM

Now, for example, user is seeking data for a date range 11-01-2007 to 04-01-2008. The following query will return the name of table(s) from where data need to be fetched.

SELECT x.table_name
FROM user_activity_split_info x
WHERE x.begin_date BETWEEN TO_DATE('11-01-2007','mm-dd-yyyy') AND TO_DATE('04-01-2008','mm-dd-yyyy')
OR x.end_date BETWEEN TO_DATE('11-01-2007','mm-dd-yyyy') AND TO_DATE('04-01-2008','mm-dd-yyyy');

Output:




TABLE_NAME
UA_200710_200712
USER_ACTIVITY

So it just returned two table names where the data would be found.

In our application code, a method will return the name of table(s) based on the user's date input and we would fetch data directly from those tables and add with combine results together. The programmer does not have to know how many tables to deal with!

One more thing, we don't have to worry about the sequence number which would cross limit after certain years. Now we can reset user_activity sequence at any point just with a single command.

I also came up with another approach which would provided more simpler programming - almost noting to change in code but that approach requires user_activity data duplication by overlapping each three months data! This is not feasible for a table like user_activity - but it was an option.

Let's see how things work!

Apr 9, 2008

Data transfers through SQL*Net

We need to check how much data is going back and forth from database servers to take some system engineering decisions. In order to know what is the flow rate for current sessions, I issued the following SQL -

/** Current Sessions **/

SELECT n.name,ss.username, ROUND((SUM(st.value))/1024/1024) MB
FROM v$sesstat st,v$session ss ,v$statname n
WHERE ss.SID=st.SID
AND n.statistic# = st.statistic#
AND n.name like '%SQL*Net%'
HAVING ROUND((SUM(st.value))/1024/1024) > 0
GROUP BY N.NAME,ss.username
ORDER BY ss.username, n.name;








NAME USERNAMEMB
--------------------------- -------------------------
SQL*Net roundtrips to/from client DBSNM1
bytes received via SQL*Net from client DBSNM 182
bytes sent via SQL*Net to client DBSNM154
bytes received via SQL*Net from clientPROD68
bytes sent via SQL*Net to client PROD 141


Well, how do I know the historical data for the instance? The answer could be like follows -

/** Since Instance Startup **/

SELECT NAME,ROUND(VALUE/1024/1024) MB
FROM v$sysstat
WHERE NAME like '%SQL*Net%'
AND ROUND(VALUE/1024/1024) > 0
ORDER BY name;






NAME MB
------------------------------------- -------------------
SQL*Net roundtrips to/from client 380
bytes received via SQL*Net from client 46331
bytes sent via SQL*Net to client 92414


There are other ways to do that from the trace files. If I enable trace level by adding the following lines in my listener.ora and reload it, then I would find my desired stuffs in the trace file.

-- listener.ora

TRACE_FILE_LISTENER = netstat-info-20080409.trc
TRACE_DIRECTORY_LISTENER = /export/home/oracle
TRACE_LEVEL_LISTENER =SUPPORT

From command line as oracle user -
# lsnrctl reload

Now it's time to check the logs after certain period -

# trcasst -s /export/home/oracle/netstat-info-20080409.trc

The trcasst just came up with the nice formatted output - cool !


*************************************************************************
* Trace Assistant *
*************************************************************************

----------------------
Trace File Statistics:
----------------------
Start Timestamp : 09-APR-2008 13:03:10:531
End Timestamp : 09-APR-2008 19:54:36:028
Total number of Sessions: 1172

DATABASE:
Operation Count: 0 OPENS, 0 PARSES, 0 EXECUTES, 0 FETCHES


ORACLE NET SERVICES:
Total Calls : 1593 sent, 1427 received, 0 oci
Total Bytes : 292573 sent, 255785 received
Average Bytes: 183 sent per packet, 179 received per packet
Maximum Bytes: 2011 sent, 2034 received

Grand Total Packets: 1593 sent, 1427 received


*************************************************************************
* Trace Assistant has completed *
*************************************************************************

Apr 3, 2008

Stop user from accidental database damage!

The guys who play with database can not help themselves doing crazy stuffs always. Sometimes we logged in multiple production servers and also test servers simultaneously. These sessions are opened in multiple console tabs. It might be happened, we want to run some operations on test servers but accidentally run those on production boxes. Just imagine, I want to run TRUNCATE TABLE operation on test box but I did it on prod boxes actually - opss! what a disaster!

To avoid this types of blundering actions, I have introduces the following trigger to prevent any kind of unconscious DDLs in production boxes. The trigger will be stored on proper schema always and if any DDL issueed in the presence of the trigger, it will simply slap us :P

CREATE OR REPLACE TRIGGER ddl_restrict_trigger
BEFORE ALTER OR CREATE OR DROP OR TRUNCATE
ON SCHEMA
BEGIN
RAISE_APPLICATION_ERROR(-20001,' You are not authorized to perform DDL. Please contact DBA team.');
END;
/

Read-only database user

Sometimes the application developers need to debug data from the production database. It demands to give access to the live schema. How can I trust a developer who accidentally won't modify stuffs in database and blander everything!

I need a read-only user on which developers only would be able to issue "select" statements. For now, it will serve my purpose.

What I did, I created a user, created role with select any table privileges and assigned that user with the role. Now the developers are restricted to issue other than select commands - cool!

There are two ways of doing this. Here are the scripts for that -

** option-1 **

> sqlplus system/*******
> create user devels identified by xyz;

> create role SELECT_ANY_TABLE;
> grant select any table to SELECT_ANY_TABLE;

> grant connect to devels;
> grant SELECT_ANY_TABLE to devels;

** option-2 **

> sqlplus system/******
> create user devels identified by xyz;
> sqlplus prod/***********

Now have to run the following piece on PL/SQL

set serveroutput on
DECLARE
sql_txt VARCHAR2(300);
CURSOR tables_cur IS
SELECT x.table_name FROM user_tables x;
BEGIN
dbms_output.enable(10000000);
FOR tables IN tables_cur LOOP
sql_txt:='GRANT SELECT ON '||tables.table_name||' TO devels';
execute immediate sql_txt;
END LOOP;
END;
/

The second option has a drawback - each time we add new tables in schema, we have to run it.