AWR - historical execution plan investigation

Sometimes you will receive complains from users that something that has been running ok for ages on a production server, suddenly is running very very slowly. In 10g, AWR gives you the ability to go and check and possibly compare the execution path of a query at different times in the past.

select DBID, SQL_ID, SQL_TEXT from dba_hist_sqltext where sql_text like ‘%your query%’;

DBID      | SQL_ID      | SQL_TEXT |

———-+————-+—————————+-

1234567890|12345xxxxxxxx|UPDATE STAGING SET BLAH    |

We know that yesterday it was running fine, but today it’s not, and users have confirmed that data volumes are identical between the two dates

select s.begin_interval_time, s.end_interval_time , q.snap_id, q.dbid, q.sql_id, q.plan_hash_value, q.optimizer_cost, q.optimizer_mode

from dba_hist_sqlstat q, dba_hist_snapshot s

where q.dbid = 1234567890 and q.sql_id = ‘12345xxxxxxxx’

and q.snap_id = s.snap_id

and s.begin_interval_time between sysdate-2 and sysdate

order by s.snap_id desc;

When you find this, take note of the PLAN_HASH_VALUE from the 2 different times that you want to compare

Actually the 3 parameters that you need are:

·         DBID

·         SQL_ID

·         PLAN_HASH_VALUE

 

And to check the execution plan, run the following query:

 

select id, operation, options, object_name, cost

from dba_hist_sql_plan

where dbid = 1234567890 and sql_id = ‘12345xxxxxxxx’

and plan_hash_value = <plan_hash_value1>

compare that with the second one

select id, operation, options, object_name, cost

from dba_hist_sql_plan

where dbid = 1234567890 and sql_id = ‘12345xxxxxxxx’

and plan_hash_value = <plan_hash_value2>

 

 

Saturday, December 13th, 2008 at 17:46

Checking Values of Hidden Parameters - Examples of how to use ESCAPE

Hidden parameters are parameters that normally start with an underscore (_%)

 

To find out all hidden parameters within an instance, login as / as sysdba and run the following sql

 

SELECT inst_id, parno_kspvld_values, name_kspvld_values , ordinal_kspvld_values, value_kspvld_values, isdefault_kspvld_values FROM x$kspvld_values

WHERE name_kspvld_values LIKE ‘\_%’ ESCAPE ‘\’

AND inst_id = USERENV (’Instance’);

 

The bonus, this will also give you an example of how to search a string with an underscore in it.

 

 

Saturday, December 13th, 2008 at 17:45

Retaining your best IT contractors

For those who are not familiar with the terminology, an IT contractor is an IT professional who chooses to offer their expertise for a client as an independent contractor. IT contractors are normally hired on a fixed time basis, e.g. 3 months, 6 months or 1 year.

 

With the current credit climate, a lot of financial institutions have slimmed down their IT departments to cut cost, by reducing staff, or persuading some contractors to take on permanent employment.

 

Some IT contractors that are retained are normally some of the best talents within the department.  With the reduction of staff, workload level seems to go up and contributing to employee burnouts.

 

There was a tendency for managers to turn to these superstars for help more and more. It is natural. They excel in what they do. And because of that, more and more tasks are sent their way, from the flagship projects, to the annoying niggling operational issues which are costing quite a lot of overtime callouts.

 

Oh, and we have some angry business users demanding overdue responses from the IT department. Again, superstars sent to the rescue.

 

Managers forget to push tasks to the rest of the team, as it is easier to get results from the superstars. Instead of demanding more from the average performers, managers demand more from their superstars. When superstars are reassigned to look at some critical issues, existing tasks that are not as technically demanding are not reassigned and redistributed to the some of the more average performers within the team.

 

At the end of the day, IT contractors are only human, and they burnout too, physically or mentally. And then they would leave, leaving managers realizing (often too late) how the superstar actually has been taking on the workload of two or three average performers. Your best talents will not easily be put off with fierce market competition. Cream will always rise to the top.

 

Managers, your challenge is to spot these talents and hang on to them.

 

 

Wednesday, December 10th, 2008 at 21:03

Specific Aggregation Market Risk System - data volume challenge - part 5

5) Mapping Table

I am reviewing the mapping table that is used to enrich the staging tables. The main structure of the mapping table is quite straightforward, can be simplified as follows:

 

SRC_FIELD

SRC_VALUE

DEST_FIELD

DEST_VALUE

AMENDED_BY

LAST_UPDATE_TIME

 

Here is a simple example of how the mapping tables will be populated

 

SRC_FIELD

SRC_VALUE

DEST_FIELD

DEST_VALUE

CNTRY

GB

COUNTRY

UK

CNTRY

England

COUNTRY

UK

 

The update for enrichment will be done as follows:

 

UPDATE STAGING

SET  COUNTRY = (select dest_value from mapping_table m

                  where m.src_field_name = ‘CNTRY’

                and m.dest_field_name=’COUNTRY’

                and rownum < 2)

where data_feed_id=12345;

 

What I am questioning here is why we need the rownum < 2?

Surely the data should be unique.

 

And unfortunately it is not.

 

select count(*), src_field, src_value, dest_field

from mapping

group by src_field, src_value, dest_field

having count(*) > 1;

 

The above query should return no rows, and unfortunately in this case, it returns quite a lot of rows.

 

In this case, developers were having a problem with the update statement, and decided to work around it by using rownum < 2. This is unfortunately is a lazy workaround, and is not helpful. This needs to be fixed, even though it could be painful and requires interaction with users to find out which rows are actually correct.

 

Considering there are AMENDED_BY and LAST_UPDATE_TIME fields, it seemed to me there was a half hearted attempt to perform an audit trail of changes in the mapping table, however it is incomplete. An IS_ACTIVE flag would help here, there the update can simply be modified as follows.

 

UPDATE STAGING

SET  COUNTRY = (select dest_value from mapping_table m

                  where m.src_field_name = ‘CNTRY’

                and m.dest_field_name=’COUNTRY’

                and isactive=’Y’)

where data_feed_id=12345;

 

Not only this means that we can now create a useful unique index against the mapping table, which would help efficiency, this also means that we will actually be enriching correctly!

 

The be continued …

 

Tuesday, December 9th, 2008 at 15:51

Specific Aggregation Market Risk System - data volume challenge - part 4

4) High volume - multi column enrichments

The staging tables consist of 100-150 columns. About 90 columns are populated during data loading, and the rest of the columns are updated using several phases of mapping.  The mapping processes will consult various lookup tables. Mappings are done using stored procedures. Within each phase, various columns that can be enriched independent of one another, are updated sequentially.

The current code can be simplified as follows (please excuse the syntax error, as I am typing this straight from my head)

cursor cur_columns is 

select column_name, column_value from lookup_tables where mapping_phase=5;

 

for i in cur_columns then

loop

v_sql := ‘update staging set ‘||i.column_name||’=’||i.column_value

||’ where data_feed_id=’||p_data_feed_id

||’ and blah ‘;

execute immediate v_sql;

commit;

end loop;

The final sql that is updated for each enriched column would look like:

update staging set industry=(inline select)  where data_feed_id=12345 and  exists (some subquery);

update staging set region=(inline select) where data_feed_id_12345 and exists  (some subquery);

Now remember, the target table partition for this data feed

  • has 1.2 million records
  • is 150 column wide
  • indexing all columns which will be used in subquery could be counter-productive, as there are so many different ones which are needed. Even if they are not counter-productive, data loading could be slowed down tremendously.

Now, what if …

  • within the same mapping phase, we do all the updates as follows:

update staging

set column1 = (inline select),

column2 = (inline select),

column3 = (inline select),

column8 = (inline select)

where data_feed_id=12345;

  • what if we just let Oracle scan the whole partition with the 1.2 million row, and let all the columns be updated in one sweep.
  • I believe that the improvement factor could be as much as number_of_columns_to_update/2 or better.
  • to improve it further, if we have implemented subpartitioning, perhaps parallel DML can be used here.

update /*+ parallel (staging,8) */ staging

set column1 = (inline select),

column2 = (inline select),

column3 = (inline select),

column8 = (inline select)

where data_feed_id=12345;

  • another thing to consider is to have a target table, as it is much much quicker to do an insert select (to include enrichment work), rather than an update. I have tested this with the following results –updating 5 columns in 1 sweep for 1.2 million records too 16 minutes, doing an insert select as below took a mere 3 minutes.

 

insert into staging_2

select staging.*,

(inline select) as newcolumn1,

(inline select) as newcolumn2,

(inline select) as newcolumn3,

(inline select) as newcolumn8

from staging

where data_feed_id=12345;

 

 

 

 

 

 

 

 

 

To be continued ..

 

Saturday, December 6th, 2008 at 19:35

Specific Aggregation Market Risk System - data volume challenge - part 3

3) Which method? Parallelization in oracle or distributed computing?

In past experiences, I have seen many many cases, where bringing processing into Oracle brings impressive significant performance gain. Oracle is built to do bulk processing. Why not? At the same time, we eliminate network overhead by doing everything inside the database server.

I always have to resist the temptation to enforce this mantra onto the development team, which seem to favour parallelization of processing through distributed computing.

Which way is the right way to go? I suppose, for some cases, the answer is, who is your best people? If you have a very talented java developer in your team, and just an average Oracle developer, you are likely to end up with a better implementation of parallelization via java. On the other hand, if the position is reversed, you are likely to end up with a better implementation of parallel processing within Oracle.

However for some other extreme cases, neither would suffice. An Oracle database and code can be tuned to death, and it will still not be good enough. Or distributed computing on it’s own would not be good enough if the database structure does not support it.

Remember, we tested the full size feed of 1.2 million records, and processing took 16 hours.

A similar test of 600k records took 4.5 hours.

One of the developers in the project team has taken an initiative to test a 300k record, broke it down to 5 separate feeds (60k each), and without any changes to the code, the whole processing took 30 minutes.  Great stuff, we are getting there.

Now let’s analyze this further.

  • I would put the 16 hr processing of 1.2 million record and the 4.5 hr processing of 600k record into the same category.  Both are unacceptable. The system has reached its breaking point, and performance degraded exponentially.
  • The parallelization to 5 streams shows that 60k records/stream seem to be handled well by the system. The 30 min processing time shows that we are on the right track here, and that the network overhead of the distributed computing does not seem to matter much. If this test has not been conducted by the developer, I would definitely have insisted on the breakdown to be done inside the database, and which would require more development time. However, because this is showing promising result, and allows reuse of existing code.
A few options can be explored here
  • Option 1) can we stay 5 streams (using 5 application servers), and process 240k records per streams, and whether it could then scale to processing time of 2 hours
  • Option 2) or can we stay with 5 streams (using 5 application servers), stay with 60k record per streams and implement pipeline method and whether it could still scale to processing time of 2 hours
  • Option 3) or should be increase the number of streams (potentially could mean adding more hardware to the application server domain)
Further questions at this point:
  • When we are flooding the database, normally the same table with loading requests, updates, etc, can the structure hold? If not what changes are needed?
  • What wiring issues will arise to the splitting of feed to multiple streams? At the moment each data feed will carry a unique data_feed_id which is currently set as the partition key to the main tables.  How do we disperse the single feed to multiple streams then collate everything back to a single feed again?

To be continued …

Saturday, December 6th, 2008 at 14:48

Understanding Business User Perspectives

During the life cycle of financial projects, sometimes we do not pay quite enough attention to the target end users of the application that we are building. 

IT  focus a lot more on project champions, those business users who are more technically savvy, and therefore considered to be more on the same wavelength. At a personal level, these champions are normally those who IT find to be easier to engage with.

 

Sometimes in the midst of our excitement of delivering management goals and the enthusiasm to build a state of the art application, the end users which are not project champions are neglected. IT need to listen more to these users as sometimes they are the people who really understand what is important to make an application more efficient, more usable. These members of the infantry matter, we should recognise that and encourage their informed participation.

 

Sometimes this boils down to cultural issue where IT and end users sit in different part of the world, sometimes it is simply the fact that end users and IT do not share the same vocabulary. Business users do not live in the world of technology, and sometimes have difficulties explaining what they need to IT. On the other hand, IT do not live in the world of business and have difficulties grasping business requirement.

 

User: “I don’t know what I want, but I will know it when I see it”. 

We  should not give up trying to listen to our users, however painful it may seem.

Tuesday, December 2nd, 2008 at 19:23

Specifications for a typical Oracle Expert engagement project

People often ask me, “what is it do you do”? Are you a DBA? Are you a data architect? or Are you a developer?

 

Below is an example of specifications of a typical engagement that I am involved with.

 

Objective of Role

  • Provide Oracle consultancy to the Risk IT team to ensure that the business is getting the best value out of their Oracle investment.
  • Actively assist in the identification of problems, performance tuning and implementation of improvements to maximise the use of oracle system across the Risk IT
  • Work closely with the development team to ensure best practices are followed and implemented, as well as provide tutoring to the development staff
  • Work closely with the strategic and operational leaders in the Risk IT team to develop programs of work and business improvement.

 

Required Skills

  • At least 10 years of experience with Oracle  administration and development (with in-depth experience with Oracle 9i and 10g) , preferably with Oracle Certification (OCP)
  • Whilst the candidate will be an Oracle expert, it is desired that the candidate will have had exposure to various risk project implementations for large financial institutions
  • Experience in business analysis and process architecting experience
  • In-depth experience with VLDB, including knowledge on data warehousing design, data modelling and implementation, familiarity with Kimball or Inmon methods is desirable.
  • In-depth experience with Oracle development to include PL/SQL and implementation using Java and C#
  • A sound knowledge of both database and system performance tuning, not only from DBA perspective, but also from application design perspective
  • Good understanding of Risk business
  • Experience in coaching, leading and supporting diverse project teams, both functionally and culturally

 

 

 

Monday, December 1st, 2008 at 22:23

Specific Aggregation Market Risk System - data volume challenge - part 2

Following the initial overview, here is my further recommendation to the application team.

1) Change table structure

Partitioning needs to be change from partition by range only to either range-hash partitioning, or list-hash partitioning. I have picked hash as the sub partitioning method in this recommendation document. However if there is a better way to sub partition the data, that method could be chosen over hashing.

 The main objective of the change is to allow parallel processing to be implemented, in this case allowing large amount of data to be distributed across multiple oracle processing to shorten processing time.

 Optimal amount of sub partitions created for a single feed needs to be determined through testing. I have tested with 8 sub partitions with fairly satisfactory results. We can try to increase this to 12 or 16, however there is one thing to consider, i.e. how heavy processing on this application could potentially impact another application which resides in the same database.

Statistics Maintenance

This procedure needs to be modified to allow this to work with tables with subpartitions. For large feeds, statistics generation with ESTIMATE_PERCENT value of 1% is sufficient. For tables with sub partitions, DEGREE parameter can be specified to allow faster statistics generation.

In conjunction with the partitioning maintenance work, mainly to replace SPLIT partition with ADD partition, an additional logic is needed, i.e. if known large data is going to be loaded, then instead a standard single partition, sub partitions need to be created.

Partition maintenance

In conjunction with the partitioning maintenance work, mainly to replace SPLIT partition with ADD partition, an additional logic is needed, i.e. if known large data is going to be loaded, then instead a standard single partition, sub partitions need to be created.

AN ORA-26002 error was reported during testing against one of the staging table. I believe that this particular staging table does not need an index after subpartition is implemented. Removal of index will eliminate this error without compromising performance.

 

2) Data Loading via 3rd party ETL application

Most of sqlldr job is done via this 3rd party ETL application. Loading  1.2 million rows of trade data took almost 29 minutes (about 100 column wide). Currently direct path loading is already used. In conjunction with sub partitioning implementation, I would like to see the use of parallel=true argument so increase throughput further.

As a benchmark, as I can not reveal the table structure here, the csv file is about 700MB in size. This shows the amount of data which needs to be loaded.

First, I need to know how slow 29 minutes is.

So I set myself a baseline

  • created a new version of the table using range-hash partition (8 buckets)
  • perform an insert test into this table to see how fast this can go

                  insert /*+ append parallel (a,8) */ into staging_table a select * from source;

  • I can load 1.2 million rows in about 70 seconds. Not bad.

This exercise is useful, as this will give me a reasonable expectation of how far the sqlldr can be tuned.

Now, I think the SQLLDR can be tuned to run as fast as 5 minutes (from the original 29 minutes). Why 5 minutes? Because I am including ETL tool overhead, and network overhead.

So now I created my own CSV and prepare a controlfile to test SQLLDR from a client host, this way I am including the network overhead. The load took less than 2 minutes. Fantastic.

Now back to performing the load using ETL tool… Why oh why did it still take 15 minutes, even after we implement parallel=true? I ended up doing some tracing, and found out that one of the column is populated using a sequence (seq_name.nextval)… Bingo. This is the bottleneck.

So with this knowledge, I go back to the application team to discuss ways of populating this column another way…

 To be continued …

 

 

 

Monday, December 1st, 2008 at 11:24

Installing APEX 3.1

 Database Preparation

  • Check that dba_lock exists. If not, run

sqlplus / as sysdba

@?/rdbms/admin/catblock

  • Create tablespace APEX

 

CREATE TABLESPACE apex

DATAFILE ‘/uxx/oradata/xxxx/apex01.dbf’ SIZE 500M REUSE

EXTENT MANAGEMENT LOCAL UNIFORM SIZE 128K SEGMENT SPACE MANAGEMENT AUTO;

 

  • check that shared_pool_size is at least 100M
  • check that system tbs has at least 85mb free
  • Check that Oracle Text is already installed

Select * from dba_registry;

 

  • If not, let’s install it then

sqlplus / as sysdba

@?/ctx/admin/catctx.sql ctxsys SYSAUX TEMP NOLOCK

 

where ctxsys is the install schema, SYSAUX is the default tablespace, Temp is temp tablespace for that user,

and NOLOCK instructs the script to not lock the account when the install is complete (lock after!)

 

– not install the language bit (UK for uk,or US for us)

connect ctxsys/ctxsys

@?/ctx/admin/defaults/drdefuk.sql file (for UK).

 

  • Check that Oracle Text is already installed

 

Select * from dba_registry;

 

CONTEXT

Oracle Text

10.2.0.3.0                     VALID

01-DEC-2008 09:46:01          SERVER

SYS                            CTXSYS

VALIDATE_CONTEXT

 


Install Oracle HTTP Server.

 

vi /u01/app/oracle/oraInst.loc_10g_httpd

inventory_loc=/u01/app/oracle/oraInventory_10g_httpd

inst_group=oinstall

 

mkdir –p =/u01/app/oracle/oraInventory_10g_httpd

 

export ORACLE_HOME=/u01/app/oracle/httpd/10.2.0

 

 

 

Find the companion CD

./runInstaller –invPtrLoc /u01/app/oracle/oraInst.loc_10g_httpd

 

Select Product to Install: Oracle Database 10g Companion Products 10.2.0.1.0 ( do not select HTML DB)

Target directory: /u01/app/oracle/httpd/10.2.0

 

Don’t forget to select Apache to install

 

Sort out images

mkdir –p :/u01/app/oracle/httpd/10.2.0/Apache/apex_images

scp –pr xx:/u01/app/oracle/httpd/10.2.0/Apache/apex_images/images .

ln -s /u01/app/oracle/httpd/10.2.0/Apache/apex_images/images

/u01/app/oracle/httpd//10.2.0/Apache/Apache/htdocs/i

 

Setup DAD

vi /u01/app/oracle/httpd/10.2.0/Apache/modplsql/conf/dads.conf

 

alias ‘i’ “/u01/app/oracle/httpd/10.2.0/Apache/apex_images/images”

AddType text/xml xbl

AddType text/x-component htc

 

<Location /pls/apex_ucig04>

Order deny,allow

PlsqlDocumentPath docs

AllowOverride None

PlsqlDocumentProcedure wwv_flow_file_mgr.process_downloadd

PlsqlDatabaseConnectString hostname:1550:DB_UAT.UK.HIBM.HSBC ServiceNameFormat

PlsqlNLSLanguage AMERICAN_AMERICA.AL32UTF8

PlsqlAuthenticationMode Basic

SetHandler pls_handler

PlsqlDocumentTablename wwv_flow_file_objects$

PlsqlDatabaseUsername APEX_PUBLIC_USER

PlsqlDefaultPage apex

PlsqlDatabasePassword APEX_PUBLIC_USER

PlsqlRequestValidationFunction wwv_flow_epg_include_modules.authorize

Allow from all

</Location>

                         

Managing Apache services

/u01/app/oracle/httpd/10.2.0/opmn/bin/opmnctl stopall

Start OPMN service

/u01/app/oracle/httpd/10.2.0/opmn/bin/opmnctl startall

Check Status

/u01/app/oracle/httpd/10.2.0/opmn/bin/opmnctl status

 

Processes in Instance: IAS-X-nsudd142.6299

——————-+——————–+———+———

ias-component      | process-type       |     pid | status

——————-+——————–+———+———

HTTP_Server        | HTTP_Server        |   19197 | Alive

LogLoader          | logloaderd         |     N/A | Down

dcm-daemon         | dcm-daemon         |     N/A | Down

 

To stop and restart HTTP_Server component

cd /u01/app/oracle/httpd/10.2.0/opmn/bin

./opmnctl stopproc ias-component=HTTP_Server

opmnctl: stopping opmn managed processes…

./opmnctl startproc ias-component=HTTP_Server

opmnctl: starting opmn managed processes…

To restart

./opmnctl restartproc ias-component=HTTP_Server

opmnctl: restarting opmn managed processes…

 

 

 


Install APEX

cd to apex media directory

 

check that LD_LIBRARY_PATH is set correctly

 

sqlplus / as sysdba

@apexins apex apex TEMP /i/

 

Check that installation is finished successfully

SELECT STATUS FROM DBA_REGISTRY WHERE COMP_ID = ‘APEX’;

This should show VALID.

If it has been unsuccessful, normally you are left with status of LOADING.

 

 

 

If installation failed, to start again

 

ALTER SESSION SET CURRENT_SCHEMA = FLOWS_020000;

exec flows_020000.wwv_flow_upgrade.switch_schemas (’FLOWS_030100′,’FLOWS_020000′);

 

DROP USER FLOWS_030100 CASCADE; (might require a shutdown force restrict if hangs)

 

Then restart with apexins again

 

Change APEX admin password

cd to apex media

sqlplus / as sysdba

@apxchpwd.sql  (and enter new password)  

 

URL

http://hostname:7777/pls/apex/apex_admin

 

for users

http://hostname:7777/pls/apex/

 

 

Checking port inside db:

select dbms_xdb.gethttpport from dual;

To set a new port:

exec dbms_xdb.sethttpport(7780)

To disable:

exec dbms_xdb.sethttpport(0)

 

 

Maintenance

How to make sure apache gets restarted automatically ….

 

 

 

Monday, December 1st, 2008 at 11:20