How to handle audit trail for time series data
We are dealing with curve data over a long range time series, where this could be market data or risk factors or scenario data, or anything of similar nature.
From time to time, data will be updated, and there is a requirement to keep an audit trail for this.
The original plan was to keep a usual shadow audit table to record all the changes that have been made to a particular point in the curve at a particular date in the time series, and simply keep the latest version in the main table.
The weakness of this solution is that it doesn’t allow easy reconstruction of the curve for any date in the past.
Let me illustrate as follows:
Observation dates – are dates within the time series. In this example we are dealing with 01-Apr-2009 – 10 Apr 2009 data
Run dates – are dates where data is fed into the system. This can be a new insert, or an update (which will trigger creation of new version)
Steps:
- On day 1 (1 May 2009), new data was loaded to the system (date 1-10 Apr, version 1 for everything)
- On day 2 (10 May 2009), new updates came in, affecting 5-10 Apr data, therefore the new data should now be stored as version 2
- On day 3 (20 May 2009), more updates came in, affecting 7-10 Apr data, therefore the new data should be stored as version 3.
|
RISK FACTOR INSTANCE ID |
RUN DATE |
OBSERVATION DATE |
BUCKET ID |
VALUE |
VERSION |
|
100 |
01-May-09 |
01-Apr-09 |
1 |
1.082798469 |
1 |
|
100 |
01-May-09 |
02-Apr-09 |
1 |
0.6882592 |
1 |
|
100 |
01-May-09 |
03-Apr-09 |
1 |
0.33327391 |
1 |
|
100 |
01-May-09 |
04-Apr-09 |
1 |
1.229654916 |
1 |
|
100 |
01-May-09 |
05-Apr-09 |
1 |
0.100964171 |
1 |
|
100 |
01-May-09 |
06-Apr-09 |
1 |
1.31731591 |
1 |
|
100 |
01-May-09 |
07-Apr-09 |
1 |
-0.13868598 |
1 |
|
100 |
01-May-09 |
08-Apr-09 |
1 |
0.470941035 |
1 |
|
100 |
01-May-09 |
09-Apr-09 |
1 |
0.243330874 |
1 |
|
100 |
01-May-09 |
10-Apr-09 |
1 |
2.045916747 |
1 |
|
100 |
10-May-09 |
05-Apr-09 |
1 |
0.167072293 |
2 |
|
100 |
10-May-09 |
06-Apr-09 |
1 |
0.832312129 |
2 |
|
100 |
10-May-09 |
07-Apr-09 |
1 |
0.890681594 |
2 |
|
100 |
10-May-09 |
08-Apr-09 |
1 |
1.91338225 |
2 |
|
100 |
10-May-09 |
09-Apr-09 |
1 |
0.764316071 |
2 |
|
100 |
10-May-09 |
10-Apr-09 |
1 |
2.51100637 |
2 |
|
100 |
20-May-09 |
07-Apr-09 |
1 |
0.537491042 |
3 |
|
100 |
20-May-09 |
08-Apr-09 |
1 |
0.91346192 |
3 |
|
100 |
20-May-09 |
09-Apr-09 |
1 |
0.91736769 |
3 |
|
100 |
20-May-09 |
10-Apr-09 |
1 |
0.27792944 |
3 |
Method 1 – using audit table to store the changes
We will have
Main table
|
RISK FACTOR INSTANCE ID |
RUN DATE |
OBSERVATION DATE |
BUCKET ID |
VALUE |
VERSION |
|
100 |
01-May-09 |
01-Apr-09 |
1 |
1.082798469 |
1 |
|
100 |
01-May-09 |
02-Apr-09 |
1 |
0.6882592 |
1 |
|
100 |
01-May-09 |
03-Apr-09 |
1 |
0.33327391 |
1 |
|
100 |
01-May-09 |
04-Apr-09 |
1 |
1.229654916 |
1 |
|
100 |
10-May-09 |
05-Apr-09 |
1 |
0.167072293 |
2 |
|
100 |
10-May-09 |
06-Apr-09 |
1 |
0.832312129 |
2 |
|
100 |
20-May-09 |
07-Apr-09 |
1 |
0.537491042 |
3 |
|
100 |
20-May-09 |
08-Apr-09 |
1 |
0.91346192 |
3 |
|
100 |
20-May-09 |
09-Apr-09 |
1 |
0.91736769 |
3 |
|
100 |
20-May-09 |
10-Apr-09 |
1 |
0.27792944 |
3 |
Audit table
|
RISK FACTOR INSTANCE ID |
RUN DATE |
OBSERVATION DATE |
BUCKET ID |
VALUE |
VERSION |
UPDATE DATE |
|
100 |
01-May-09 |
05-Apr-09 |
1 |
0.100964171 |
1 |
10-May-09 |
|
100 |
01-May-09 |
06-Apr-09 |
1 |
1.31731591 |
1 |
10-May-09 |
|
100 |
01-May-09 |
07-Apr-09 |
1 |
0.13868598 |
1 |
10-May-09 |
|
100 |
01-May-09 |
08-Apr-09 |
1 |
0.470941035 |
1 |
10-May-09 |
|
100 |
01-May-09 |
09-Apr-09 |
1 |
0.243330874 |
1 |
10-May-09 |
|
100 |
01-May-09 |
10-Apr-09 |
1 |
2.045916747 |
1 |
10-May-09 |
|
100 |
10-May-09 |
07-Apr-09 |
1 |
0.890681594 |
2 |
20-May-09 |
|
100 |
10-May-09 |
08-Apr-09 |
1 |
1.91338225 |
2 |
20-May-09 |
|
100 |
10-May-09 |
09-Apr-09 |
1 |
0.764316071 |
2 |
20-May-09 |
|
100 |
10-May-09 |
10-Apr-09 |
1 |
2.51100637 |
2 |
20-May-09 |
Now, a user will want to then run a report “show me my data as of 15 May 2009”.
Not easy, right?
Method 2 – version handling without audit table
In this method, I will store all versions in the same table.
To reconstruct data as of 15 May 2009, this is the data that we are after
|
RISK FACTOR INSTANCE ID |
RUN DATE |
OBSERVATION DATE |
BUCKET ID |
VALUE |
VERSION |
|
100 |
01-May-09 |
01-Apr-09 |
1 |
1.082798469 |
1 |
|
100 |
01-May-09 |
02-Apr-09 |
1 |
0.6882592 |
1 |
|
100 |
01-May-09 |
03-Apr-09 |
1 |
0.33327391 |
1 |
|
100 |
01-May-09 |
04-Apr-09 |
1 |
1.229654916 |
1 |
|
100 |
10-May-09 |
05-Apr-09 |
1 |
0.167072293 |
2 |
|
100 |
10-May-09 |
06-Apr-09 |
1 |
0.832312129 |
2 |
|
100 |
10-May-09 |
07-Apr-09 |
1 |
0.890681594 |
2 |
|
100 |
10-May-09 |
08-Apr-09 |
1 |
1.91338225 |
2 |
|
100 |
10-May-09 |
09-Apr-09 |
1 |
0.764316071 |
2 |
|
100 |
10-May-09 |
10-Apr-09 |
1 |
2.51100637 |
2 |
To reconstruct data as of 25 May 2009, this is the data that we are after
|
RISK FACTOR INSTANCE ID |
RUN DATE |
OBSERVATION DATE |
BUCKET ID |
VALUE |
VERSION |
|
100 |
01-May-09 |
01-Apr-09 |
1 |
1.082798469 |
1 |
|
100 |
01-May-09 |
02-Apr-09 |
1 |
0.6882592 |
1 |
|
100 |
01-May-09 |
03-Apr-09 |
1 |
0.33327391 |
1 |
|
100 |
01-May-09 |
04-Apr-09 |
1 |
1.229654916 |
1 |
|
100 |
10-May-09 |
05-Apr-09 |
1 |
0.167072293 |
2 |
|
100 |
10-May-09 |
06-Apr-09 |
1 |
0.832312129 |
2 |
|
100 |
20-May-09 |
07-Apr-09 |
1 |
0.537491042 |
3 |
|
100 |
20-May-09 |
08-Apr-09 |
1 |
0.91346192 |
3 |
|
100 |
20-May-09 |
09-Apr-09 |
1 |
0.91736769 |
3 |
|
100 |
20-May-09 |
10-Apr-09 |
1 |
0.27792944 |
3 |
Next, how do we reconstruct the risk factor data as of any point in time in the past?
This is easily done using oracle analytics (RANK).
select *
from (
select observation_date, bucket_id, value,
version,
rank() over (partition by observation_date, bucket_id order by version desc) as rank from mydata where run_date <= <date of required reconstruction>)
where rank =1
The same model will obviously support multiple updates within a day, simply make sure we store the timestamp as well.
Obviously this carries some overhead, so some housekeeping criteria will have to be defined and implemented to ensure we do not indefinitely keep old versions of data.