How to handle audit trail for time series data

We are dealing with curve data over a long range time series, where this could be market data or risk factors or scenario data, or anything of similar nature.

From time to time, data will be updated, and there is a requirement to keep an audit trail for this.

The original plan was to keep a usual shadow audit table to record all the changes that have been made to a particular point in the curve at a particular date in the time series, and simply keep the latest version in the main table.

The weakness of this solution is that it doesn’t allow easy reconstruction of the curve for any date in the past.

Let me illustrate as follows:

Observation dates – are dates within the time series. In this example we are dealing with 01-Apr-2009 – 10 Apr 2009 data

Run dates – are dates where data is fed into the system. This can be a new insert, or an update (which will trigger creation of new version)

Steps:

  1. On day 1 (1 May 2009), new data was loaded to the system (date 1-10 Apr, version 1 for everything)
  2. On day 2 (10 May 2009), new updates came in, affecting 5-10 Apr data, therefore the new data should now be stored as version 2
  3. On day 3 (20 May 2009), more updates came in, affecting 7-10 Apr data, therefore the new data should be stored as version 3.

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

01-May-09

05-Apr-09

1

0.100964171

1

100

01-May-09

06-Apr-09

1

1.31731591

1

100

01-May-09

07-Apr-09

1

-0.13868598

1

100

01-May-09

08-Apr-09

1

0.470941035

1

100

01-May-09

09-Apr-09

1

0.243330874

1

100

01-May-09

10-Apr-09

1

2.045916747

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

10-May-09

07-Apr-09

1

0.890681594

2

100

10-May-09

08-Apr-09

1

1.91338225

2

100

10-May-09

09-Apr-09

1

0.764316071

2

100

10-May-09

10-Apr-09

1

2.51100637

2

100

20-May-09

07-Apr-09

1

0.537491042

3

100

20-May-09

08-Apr-09

1

0.91346192

3

100

20-May-09

09-Apr-09

1

0.91736769

3

100

20-May-09

10-Apr-09

1

0.27792944

3

Method 1 – using audit table to store the changes

We will have

Main table

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

20-May-09

07-Apr-09

1

0.537491042

3

100

20-May-09

08-Apr-09

1

0.91346192

3

100

20-May-09

09-Apr-09

1

0.91736769

3

100

20-May-09

10-Apr-09

1

0.27792944

3

Audit table

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

UPDATE DATE

100

01-May-09

05-Apr-09

1

0.100964171

1

10-May-09

100

01-May-09

06-Apr-09

1

1.31731591

1

10-May-09

100

01-May-09

07-Apr-09

1

0.13868598

1

10-May-09

100

01-May-09

08-Apr-09

1

0.470941035

1

10-May-09

100

01-May-09

09-Apr-09

1

0.243330874

1

10-May-09

100

01-May-09

10-Apr-09

1

2.045916747

1

10-May-09

100

10-May-09

07-Apr-09

1

0.890681594

2

20-May-09

100

10-May-09

08-Apr-09

1

1.91338225

2

20-May-09

100

10-May-09

09-Apr-09

1

0.764316071

2

20-May-09

100

10-May-09

10-Apr-09

1

2.51100637

2

20-May-09

Now, a user will want to then run a report “show me my data as of 15 May 2009”.

Not easy, right?

Method 2 – version handling without audit table

In this method, I will store all versions in the same table.

To reconstruct data as of 15 May 2009, this is the data that we are after

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

10-May-09

07-Apr-09

1

0.890681594

2

100

10-May-09

08-Apr-09

1

1.91338225

2

100

10-May-09

09-Apr-09

1

0.764316071

2

100

10-May-09

10-Apr-09

1

2.51100637

2

To reconstruct data as of 25 May 2009, this is the data that we are after

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

20-May-09

07-Apr-09

1

0.537491042

3

100

20-May-09

08-Apr-09

1

0.91346192

3

100

20-May-09

09-Apr-09

1

0.91736769

3

100

20-May-09

10-Apr-09

1

0.27792944

3

Next, how do we reconstruct the risk factor data as of any point in time in the past?

This is easily done using oracle analytics (RANK).

select *

from (

select observation_date, bucket_id, value,

version,

rank() over (partition by observation_date, bucket_id order by version desc) as rank from mydata where run_date <= <date of required reconstruction>)

where rank =1

The same model will obviously support multiple updates within a day, simply make sure we store the timestamp as well.

Obviously this carries some overhead, so some housekeeping criteria will have to be defined and implemented to ensure we do not indefinitely keep old versions of data.

Wednesday, June 3rd, 2009 at 13:11

Defragmenting a very large partition table

I am working on a market risk data warehouse, where we needed to rationalize the space used by a very large table.

The characteristics of the table were as follows:

  • Total number of rows 5,200,000,000 (5.2 billion rows)
  • Total table size 700GB
  • Data was stored in an ASSM tablespace with uniform extents of 4MB
  • Data was split into 1000 partitions (by source system), and each partition was further split to around 60 subpartitions.
  • Data distribution across sub-partitions and across partitions are also not even, causing a lot wastage taken by sub-partitions which do not hold a lot of data.

Having reviewed the system, each partition is normally read as a whole, and for the largest partitions, parallel query/parallel DML were used using degree of parallelism of 8 only. Therefore, there was really no good reason to maintain 60 sub-partitions within a single partition. So it was decided that we will reduce it from 60 to 8.

Initially we tried performing the work in-situ, by using partition merge. We tested this against the largest partition with 47 million rows, and it took 45 minutes to complete. Considering we have 1000 partitions to do, even if we run this in parallel, it would be impossible to do this work over a weekend.

Next we tried another strategy:

  • Create a new table with the same structure
  • Create all the partitions/sub-partitions using the new template
  • Set the table to be nologging
  • Perform insert select with APPEND and PARALLEL hints for both read and write.
  • Run 12 streams

We were getting 4 billion rows throughput per hour, meaning that the work was done well within 1.5 hours.

Size of final table was 450GB, a saving of 250GB.

This means we will have all the time in the world to add any indexes if needed, plus to regenerate the statistics on this new table.

I knew Oracle was not slow, but this is impressive indeed.

Thursday, May 28th, 2009 at 15:49

Compromising between Java and Oracle to load LOB/CLOB/XML

Problem:

We have the challenge to load feed files, each with approximately 100k rows of trade data with XML which sizes vary between 30k to 300k, into an Oracle database, across the network, as quickly as possible.

Trade data is processed in a grid using a Java application, which in turn produces these XMLs to be inserted into the database. The Java application produces row by row insert statements against the database, and in effort to increase throughput, multiple streams of inserts will be run.  It was reported that performance was unsatisfactory, and that the assumption that the bottleneck is the database.

Constraints:

The use of external tables, sqlldr are not an option at this point, as it requires large changes to the java application which currently produces the required data.

Analysis:

Those who are familiar with Oracle will question the report that Oracle is struggling to load a mere 100k rows of data. On closer analysis, it is clear that the real bottleneck is network. 100k rows of data equates to about 7GB worth of data.


1) Tried using a new listener with 32k packet sizing, as opposed to the default 2k. I expected that this would reduce the amount of return trips that have to be made to transfer al the data. Good idea, but turned out that it hardly made any difference in throughput

2) We partitioned the target table to 20 partitions (hash), in the hope to load balance the concurrent loader to hit the different partitions. No improvement, and the main bottleneck is still network

3) OK, hardcore DBAs might frown on this one, but I think this is a very neat and interesting approach. We zipped the XML at source (at the java application side) to reduce the amount of data have to be shifted across the network. We later unzipped the XML with a simple trigger on the table, prior to writing it to the target column (yes, this works row by row still). This eliminates the network bottleneck, which is a very good thing, however performance is still not good enough

4) Altered the table and modify the lob column to use the CACHE option. This improved performance considerably, but now we were seeing a lot of “log buffer space” wait event. This was confusing me, because I had set the table to be NOLOGGING. After much digging, turned out that CACHE does not go hand in hand with NOLOGGING. If you cache, logging will be turned on automatically. The fact is, nocache nologging is much much slower than cache logging. So we will just have to deal with the “log buffer space” wait.

5) Increased the “log_buffer” parameter from 2MB to 8MB, as well as move the redo logs to the faster device. This reduced the log buffer space wait event, but introduced new wait “buffer busy waits”. The explanation for this is relatively simple, the concurrent inserts often try to write to the same data block. There are different options to deal with this. One is to use a tablespace with smaller block size to reduce the number of rows that can be fit into a single block. This will hopefully reduce the buffer busy waits, but not necessarily eliminate it. Or we can deal with this from the application side. We choose to do this instead.

6) We modified  the java code which control the job distribution across streams. A single stream is dynamically assigned to a unique partition. Therefore when the 40 streams are running, each one is guaranteed to be working against different blocks, therefore eliminating buffer busy waits completely.

And the results? 7GB worth of XML loaded within 5.5 minutes. Not bad.

Thursday, May 7th, 2009 at 15:40

Generalist or Specialist

Well, I have reached a point in my career (and my life) where I need to have a re-think. I have been working as a data architect covering various asset classes in the financial industries (read: a generalist), gaining respect and reconigition from senior managements and colleagues alike, something that I have found extremely enjoyable and rewarding.

The irony is, being a generalist, I am supposed to be able to maintain the the big picture, to be able to do pattern matching, to abstract similarities between different business areas, however I feel that in reality, it is not the case. While I can be recognized to be very good, I want to be excellent, I want to be extraordinary. Something is missing. Generalists do not drive innovation and long-term results, specialists do. As an individual looking for career longetivity and personal satisfaction, I am terrified of the prospect of being complacent with being a ‘talented generalist’, knowing a bit of a lot of things, and experts at nothing. The grass is always greener on the other side, they say. Perhaps it’s true, but I am giving it a go.

So! I have decided to hang up my generalist uniform for now, and take a plunge to move to the risk management area within one of the largest banks in the world. I have set myself a 2 year goal, contribute, bring fresh perspective and imagination to the area, while setting a personal target to be a specialist in Risk technologies. With everything that is going with the current economic climate, I can’t think of a more exciting place to be than the area of Financial Risk Management.

In order to become a specialist in Risk-IT, I would need the gain the general knowledge and understanding of risk management  So will that make me a specialist in one hand, and a generalist in others? Doesn’t that beat the purpose of my career change anyway? Am I a generalist becoming a specialist, or a specialist becoming a generalist? I don’t know, and I don’t care. One thing is for sure, I can’t wait to start.

Thursday, April 16th, 2009 at 21:11

Finance - GL related systems - fundamentals

I was asked to look at a finance system, which primarily is a reconciliation system between front office and back office sub ledger systems. The amount of records they have to deal with is about 2.5 million rows/day. Sizable, but not all that big. However, the system was so slow, it was unusable.

I can simply discount lack of hardware resource as an issue, as they are running on a relatively new and powerful server.

It turned out that the system was designed on Microsoft Access, and simply ported to Oracle 10g (version 10.2.0.4). Developers have no real understanding of basic oracle fundamentals. It astonished me that this happened in a large financial institution that is not short of oracle expertise.

To cut the story short, the solutions are simple:

  • Partition the large balance tables by date. This is almost a gospel in finance-IT. You will read your data by business date, therefore partition your data by business date. Simple.
  • Do not use global indexes for your partition tables. Really, you should be able to survive with local indexes only, and this will save you a lot of time that otherwise you would need to spend maintaining the global indexes.
  • Do not use DEFAULT partition. I would rather have my system fail when it tries to write to a non-existing partition, rather than have this hidden from me, and a few days later, I end up having to deal with performance issue because my DEFAULT partition gets so big
  • Do not SPLIT partition. ADD. It’s faster.
  • Create yourself a simple but robust partition maintenance package, which will perform partition add, drop, truncate. This package can then be granted to operator accounts.
  • Enhance the package so it also caters for statistics maintenance.
  • If you rely on a lot of staging tables, make sure that statistics for these tables are not generated when the tables are empty. Generate statistics when these tables are populated. If necessary, simply do this once, and then lock the stats.
  • If your rely on a lot of inserts and updates in your code, spend some time learning about MERGE command. It is a god send for finance-IT systems.

When designing your system, make it self maintaining. Your batch should include these steps

  • roll business date
  • add all required partition for the new business date (do not take the lazy way of pre-creating 100 partitions ahead of time)
  • after populating a partition, do your stats generation there and then

OK, rant over.

Saturday, January 24th, 2009 at 15:23

Market Data Store - Data Modelling

I am currently drafting a data model for market data storage, to be used in a scenario engine project within the risk management area. This new engine will be replacing an existing system, which is deemed to be quite out of date, and does not cater for all the risk factors required by the business.

I have started reviewing the existing system to learn about a thing or two about the data model used. One thing I quickly noticed is that it has separate group of tables for each risk factor. Prior to this, I had a plan to create a centralized generic table to store all type of risk factors.

So this rings a bell. Why was it done that way in the existing system? Is there a good reason why each risk factor has its own set of tables? Is it because they are so different from each other, that no common attributes can be found across the various risk factors? Or it is simply a bad existing design?

I have decided to keep my original idea to have a generic table for all type of risk factors for now. As this will give the system flexibility to add new risk factors easily in the future. However, I need to start doing some homework to understand the storage requirement better.

So what are the facts that I know so far

FACTS

  • 100 risk factors
  • between 30-50 bucket points within a single curve
  • system will need to cater for up to 1000 scenarios
  • observation length is between 2-7 years
  • most risk factors will only need a single bucket (tenor) identifier , others will need two (tenor and strike) - question: can the two types live in a single table?
So the workflow will be as follows:
  • load market data from various sources
  • clean market data, fill in gaps, either by copying previous day’s market data, or in the case of missing time horizon (buckets), interpolate
  • cleaned market data is then disturbed/shocked in various ways to produce scenarios.

To be continued … honest

Saturday, January 24th, 2009 at 14:56

Aspiring to become more than just an Oracle DBA 2.0

It saddens me in a way to read the widely talked article by 2 oracle certified masters, titled “Performing an Oracle DBA 1.0 to DBA 2.0 Upgrade”, a nicely written article encouraging DBAs out there to equip themselves better in order to be able to maintain the quality needed to work as a good Oracle DBA in a modern environment.

 While it contains many good pointers about DBAs needing to expand their skillsets to include system and network administration, I think it lacks the ultimate need to encourage Oracle DBAs out there  to step out of their comfort zone and do one thing that most DBAs do not do, i.e. trying to understand the business background which drives the need to build  the application which uses the Oracle databases that we look after. 

I have seen so many examples where DBAs would spend a lot of effort performing various tracing against a system to solve a performance problem. And spend further time analyzing the output  to come up with an understanding of a system behavior.

 The same answer could have been obtained by simply politely asking the users or the application teams about the nature of the application or the usage. I have time and time again found that these information is more valuable and accurate compared to my trendy detective work using highly technical tracing.

The other thing is the contentment of many DBAs to simply be tasked to ensure the running of an Oracle database. While I am not disputing that this is a mighty important task, sometimes it puzzles me how an Oracle DBA can have so little interest in understanding why a particular that they look after is deemed so important to the business, why it needs to be up 24/7, why an mere extra 20 min in batch processing time causing that particular user in NY to call us in frustration.

 Humanizing DBA role will also help a DBA gain respect their users/developers better and vice versa.

 And from my personal view, it’s much more interesting to expand my view this way, rather than filling up my brain with more new oracle tips and tricks, fancy tracing commands, setting events, setting hidden parameters, basically all the things that Oracle DBAs have been forced to learn by Oracle, because they can’t make their product work as it should in the first place?

Friday, January 9th, 2009 at 22:52

The importance of raw, unfiltered collaboration

Challenges within an organisation, within reasons, should be made known to everyone within the organisation, without it being filtered to senior management’s perception of what the involved individuals need to know.

Instead of having just a group of select individuals owning a specific problem, companies should try to tap the skills and contributions from a much wider audience within the organisation. Break the exclusivity and the silos, and you should see increased productivity and quality.

Say, an issue arises, which needs immediate attention. Finance asset class needs to consolidate the old disperse legacy general ledger systems that they have in order to reduce cost of maintaining these disperse systems, as well as to have a consolidated, single truth view of their finances. Or credit crunch prompting the need for a new age of risk management, putting pressure on risk systems to be able to calculate risk in a near-time manner.

When issues arise within a department in a large organisation, it is common that each department would call on the same individuals time and time again to address the different problems. These individuals could be very experienced, highly respected in their fields, with broad knowledge. And they might even call on for external help, perhaps a trusted vendor which they have dealt with in the past, or a consultant.

When they need to engage expertise from other areas, it will be something like this.

 “We need to hire a new person with expertise implementing Oracle GL. Engage the DBA team, as we will need a new Oracle database. Engage the hardware team, as we need an application server running Java, a reporting server with WebSphere. Ask for support team to provide example of feeds from each of the old GL system that we are going to replace. Let’s engage the network team, as we will need a 10GB dedicated link between application and database server”

This is what I mean by filtered collaboration. It is following an old fashioned hierarchical line of authority. Information is delivered on “need to know” basis. Somehow, someone higher up the chain decides what piece of information is suitable for each time he/she is collaborating with.

During my various engagements as an Oracle consultant, I have often found it frustrating that information is being drip-fed, filtered, sometimes with genuine intent to save me from confusion; everything has been tailored as if I have no ability to understand anything else that is not related to a database world. In occasions this has led me to make decisions that I would not have made had I been given the bigger picture earlier. 

When companies want to tap to the brightest minds within their organisations, they need to provide them with the whole story, not tailored sub questions, which will only hold them back from contributing to their full potential.

 

Saturday, December 20th, 2008 at 22:46

A report which dynamically ignores all zero value columns

A risk reporting table contains exposure values by counterparty for the last 7 days.

 

The structure of the table is as follows:

 

COUNTERPARTY        VARCHAR2(50)

T                    NUMBER

T-1                  NUMBER

T-2                  NUMBER

T-3                  NUMBER

T-4                  NUMBER

T-5                  NUMBER

T-6                  NUMBER

 

Values for Saturdays and Sundays will always be zero.

T to T-6 will be a moving window, therefore we can not hardcode which days are Saturdays or Sundays.

 

The requirement is to load the data into another table, where the final product will be a CSV, but users do not want to include the Saturday/Sunday all zero columns.

 

How do we do this?

 

A quick way would be to write a code like this

 

 

 

 

 

declare
  v_t number;
  v_t1 number;
  v_t2 number;
  v_t3 number;
  v_t4 number;
  v_t5 number;
  v_t6 number;

  v_sql varchar2(2000);

begin
select sum(t), sum(t-1), sum(t-2), sum(t-3), sum(t-4), sum(t-5), sum(t-6)
into v_t, v_t1,v_t2,v_t3,v_t4,v_t5,v_t6
from table1 ;

v_sql := ‘insert into newtable as select ‘
if v_t <> 0 then append to v_sql
if v_t1 <> 0 then append to v_sql
if v_t2 <> 0 then append to v_sql
etc

v_sql := v_sql ||’ from table1 ;’

execute immediate v_sql;
commit;

end;

Thursday, December 18th, 2008 at 18:14

Scratching the surface of Oracle 11g new features - Part 1

PIVOTING DATA

 

select cust_id, state_code, times_purchased

from customer order by cust_id;

 

   CUST_ID STATE_ TIMES_PURCHASED

———- —— —————

         1 CT                   1

         2 NY                  10

         3 NY                   2

         4 NY                  15

         5 CA                   1

         6 OH                   0

         7 OH                   0

 

select state_code, times_purchased, count(*) as cnt from customer

group by state_code, times_purchased;

 

STATE_ TIMES_PURCHASED        CNT

—— ————— ———-

NY                  10          1

NY                   2          1

NY                  15          1

CA                   1          1

CT                   1          1

OH                   0          2

 

select * from (select times_purchased,state_code from customer t)

pivot

(count(state_code) for state_code in (’NY’,'CA’,'CT’,'OH’))

order by times_purchased

/

 

TIMES_PURCHASED       ‘NY’       ‘CA’       ‘CT’       ‘OH’

————— ———- ———- ———- ———-

              0          0          0          0          2

              1          0          1          1          0

              2          1          0          0          0

             10          1          0          0          0

             15          1          0          0          0

 

INCREMENTAL GLOBAL STATISTICS

Global stats for a partition table can now be updated incrementally based on the changed partition alone.

dbms_stats.set_table_prefs(‘SCHEMA_NAME’,’TABNAME’,’INCREMENTAL’,’TRUE’)

 

And combine this with granularity => ‘AUTO’

 

   dbms_stats.gather_table_stats (
           ownname            => ‘SCHEMA_NAME’,
           tabname         => ‘TABNAME’,
           partname        => 'P_FEED_12345’,
           granularity     => 'AUTO')

 

RESULT CACHING

This thing is very exciting (well, at least to me, anyway). I think this is going to be so useful for long running queries, or queries that involve large amount of data. This is not to be confused with materialized views.

 

Just imagine for a few second that this is a very complicated , resource intensive long running query…

 

explain plan set statement_id=’hilda1′ for select sum(times_purchased) from customer;

 

set pagesize 25

set linesize 100

set long 1000000

col xplan format a80

 

select * from table(dbms_xplan.display(’PLAN_TABLE’,'hilda1′,’TYPICAL’));

 

PLAN_TABLE_OUTPUT

—————————————————————————————————-

Plan hash value: 487422484

 

——————————————————————————-

| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

——————————————————————————-

|   0 | SELECT STATEMENT   |          |     1 |    13 |     3   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE    |          |     1 |    13 |            |          |

|   2 |   TABLE ACCESS FULL| CUSTOMER |     7 |    91 |     3   (0)| 00:00:01 |

——————————————————————————-

 

Note

—–

   - dynamic sampling used for this statement

 

13 rows selected.

 

Now we are going do it again, but this time with this /*+ result_cache */ hint

 

explain plan set statement_id=’hilda2′ for select /*+ result_cache */ sum(times_purchased) from customer;

 

select * from table(dbms_xplan.display(’PLAN_TABLE’,'hilda2′,’TYPICAL’));

 

PLAN_TABLE_OUTPUT

—————————————————————————————————-

Plan hash value: 487422484

 

————————————————————————————————–

| Id  | Operation           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |

————————————————————————————————–

|   0 | SELECT STATEMENT    |                            |     1 |    13 |     3   (0)| 00:00:01 |

|   1 |  RESULT CACHE       | 2rafyj4xfv3f725guctywn2n3t |       |       |            |          |

|   2 |   SORT AGGREGATE    |                            |     1 |    13 |            |          |

|   3 |    TABLE ACCESS FULL| CUSTOMER                   |     7 |    91 |     3   (0)| 00:00:01 |

————————————————————————————————–

 

Result Cache Information (identified by operation id):

——————————————————

 

   1 - column-count=1; dependencies=(HILDA.CUSTOMER); attributes=(single-row); name=”select /*+ resu

lt_cache */ sum(times_purchased) from customer”

 

 

Note

—–

   - dynamic sampling used for this statement

 

19 rows selected.

 

You can flush the cache by executing dbms_result_cache.flush

 

 

VIRTUAL COLUMNS

Normally things like this can only be achieved using a trigger or manual population.

 

create table employee (name varchar2(20), salary number, grade varchar2(50)

generated always as

 (case when salary <=10000 then ‘LOW’

       when salary <=20000 then ‘MEDIUM’

       when salary <=30000 then ‘HIGH’

       when salary <=40000 then ‘are you a CEO?’ end) virtual);

 

 

insert into employee (name,salary) values (’Jack’,10000);

insert into employee (name,salary) values (’Jill’,25000);

insert into employee (name,salary) values (’Tom’,5000);

insert into employee (name,salary) values (’Jane’,32000);

commit;

 

select * from employee;

 

NAME           SALARY GRADE

———- ———- ——————————————

Jack            10000 LOW

Jill            25000 HIGH

Tom              5000 LOW

Jane            32000 are you a CEO?

 

Although the name is “Virtual”, this virtual column will take on physical space in the database, just like the other columns.

 

 

INVISIBLE INDEXES

This feature will be useful when trying to determine an impact an index will have on a query plan.

 

Let’s just create a simple index on our existing table.

 

create unique index customer_ix01 on customer(cust_id,cust_name);

 

explain plan set statement_id=’hilda1′ for select * from customer where cust_id=4 and cust_name=’Customer4′;

 

select * from table(dbms_xplan.display(’PLAN_TABLE’,'hilda1′,’BASIC’));

—————————————————–

| Id  | Operation                   | Name          |

—————————————————–

|   0 | SELECT STATEMENT            |               |

|   1 |  TABLE ACCESS BY INDEX ROWID| CUSTOMER      |

|   2 |   INDEX UNIQUE SCAN         | CUSTOMER_IX01 |

—————————————————–

 

Now let’s make the index invisible

 

alter index customer_ix01 invisible;

 

 

explain plan set statement_id=’hilda2′ for select * from customer where cust_id=4 and cust_name=’Customer4′;

 

 

select * from table(dbms_xplan.display(’PLAN_TABLE’,'hilda2′,’BASIC’));

————————————–

| Id  | Operation         | Name     |

————————————–

|   0 | SELECT STATEMENT  |          |

|   1 |  TABLE ACCESS FULL| CUSTOMER |

————————————–

 

 

You now see that the index has now been made invisible, thus not used by the query.

 

To force a session to “see” invisible indexes you can change session parameter OPTIMIZER_USE_INVISIBLE_INDEXES

 

 

alter session set optimizer_use_invisible_indexes=true;

 

explain plan set statement_id=’hilda3′ for select * from customer where cust_id=4 and cust_name=’Customer4′;

 

 

select * from table(dbms_xplan.display(’PLAN_TABLE’,'hilda3′,’BASIC’));

 

—————————————————–

| Id  | Operation                   | Name          |

—————————————————–

|   0 | SELECT STATEMENT            |               |

|   1 |  TABLE ACCESS BY INDEX ROWID| CUSTOMER      |

|   2 |   INDEX UNIQUE SCAN         | CUSTOMER_IX01 |

—————————————————–

 

A new column has been added to the USER_INDEXES system view to show index visibility.

 

select visibility from user_indexes where index_name=’CUSTOMER_IX01′;

 

VISIBILITY

—————————

INVISIBLE

 

 

 

Monday, December 15th, 2008 at 15:19