How to handle audit trail for time series data

We are dealing with curve data over a long range time series, where this could be market data or risk factors or scenario data, or anything of similar nature.

 

From time to time, data will be updated, and there is a requirement to keep an audit trail for this.

 

The original plan was to keep a usual shadow audit table to record all the changes that have been made to a particular point in the curve at a particular date in the time series, and simply keep the latest version in the main table.

 

The weakness of this solution is that it doesn’t allow easy reconstruction of the curve for any date in the past.

 

Let me illustrate as follows:

 

Observation dates – are dates within the time series. In this example we are dealing with 01-Apr-2009 – 10 Apr 2009 data

 

Run dates – are dates where data is fed into the system. This can be a new insert, or an update (which will trigger creation of new version)

 

Steps:

  1. On day 1 (1 May 2009), new data was  loaded to the system (date 1-10 Apr, version 1 for everything)
  2. On day 2 (10 May 2009), new updates came  in, affecting 5-10 Apr data, therefore the new data should now be stored as version 2
  3. On day 3 (20 May 2009), more updates came  in, affecting 7-10 Apr data, therefore the new data should be stored as version 3.

 

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

01-May-09

05-Apr-09

1

0.100964171

1

100

01-May-09

06-Apr-09

1

1.31731591

1

100

01-May-09

07-Apr-09

1

-0.13868598

1

100

01-May-09

08-Apr-09

1

0.470941035

1

100

01-May-09

09-Apr-09

1

0.243330874

1

100

01-May-09

10-Apr-09

1

2.045916747

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

10-May-09

07-Apr-09

1

0.890681594

2

100

10-May-09

08-Apr-09

1

1.91338225

2

100

10-May-09

09-Apr-09

1

0.764316071

2

100

10-May-09

10-Apr-09

1

2.51100637

2

100

20-May-09

07-Apr-09

1

0.537491042

3

100

20-May-09

08-Apr-09

1

0.91346192

3

100

20-May-09

09-Apr-09

1

0.91736769

3

100

20-May-09

10-Apr-09

1

0.27792944

3

 

Method 1 – using audit table to store the changes

 

We will have

 

Main table

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

20-May-09

07-Apr-09

1

0.537491042

3

100

20-May-09

08-Apr-09

1

0.91346192

3

100

20-May-09

09-Apr-09

1

0.91736769

3

100

20-May-09

10-Apr-09

1

0.27792944

3

 

Audit table

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

UPDATE  DATE

100

01-May-09

05-Apr-09

1

0.100964171

1

10-May-09

100

01-May-09

06-Apr-09

1

1.31731591

1

10-May-09

100

01-May-09

07-Apr-09

1

0.13868598

1

10-May-09

100

01-May-09

08-Apr-09

1

0.470941035

1

10-May-09

100

01-May-09

09-Apr-09

1

0.243330874

1

10-May-09

100

01-May-09

10-Apr-09

1

2.045916747

1

10-May-09

100

10-May-09

07-Apr-09

1

0.890681594

2

20-May-09

100

10-May-09

08-Apr-09

1

1.91338225

2

20-May-09

100

10-May-09

09-Apr-09

1

0.764316071

2

20-May-09

100

10-May-09

10-Apr-09

1

2.51100637

2

20-May-09

 

Now, a user will want to then run a report “show me my data as of 15 May 2009”.

Not easy, right?

 

Method 2 – version handling without audit table

 

In this method, I will store all versions in the same table.

 

To reconstruct data as of 15 May 2009, this is the data that we are after

 

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

10-May-09

07-Apr-09

1

0.890681594

2

100

10-May-09

08-Apr-09

1

1.91338225

2

100

10-May-09

09-Apr-09

1

0.764316071

2

100

10-May-09

10-Apr-09

1

2.51100637

2

 

To reconstruct data as of 25 May 2009, this is the data that we are after

 

RISK FACTOR INSTANCE ID

RUN DATE

OBSERVATION DATE

BUCKET ID

VALUE

VERSION

100

01-May-09

01-Apr-09

1

1.082798469

1

100

01-May-09

02-Apr-09

1

0.6882592

1

100

01-May-09

03-Apr-09

1

0.33327391

1

100

01-May-09

04-Apr-09

1

1.229654916

1

100

10-May-09

05-Apr-09

1

0.167072293

2

100

10-May-09

06-Apr-09

1

0.832312129

2

100

20-May-09

07-Apr-09

1

0.537491042

3

100

20-May-09

08-Apr-09

1

0.91346192

3

100

20-May-09

09-Apr-09

1

0.91736769

3

100

20-May-09

10-Apr-09

1

0.27792944

3

 

Next, how do we reconstruct the risk factor data as of any point in time in the past?

This is easily done using oracle analytics (RANK).

 

select *

from (

select observation_date, bucket_id, value,

version, 

rank() over (partition by observation_date, bucket_id order by version desc) as rank from mydata where run_date <= <date of required reconstruction>)

where rank =1

 

The same model will obviously support multiple updates within a day, simply make sure we store the timestamp as well.

 

Obviously this carries some overhead, so some housekeeping criteria will have to be defined and implemented to ensure we do not indefinitely keep old versions of data.

 

 

Wednesday, June 3rd, 2009 at 13:11
No comments yet.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>