Specific Aggregation Market Risk System - data volume challenge - part 1
I have been dealt a new challenge, i.e. to improve performance on a risk system which data volume will increase by 10 fold(!), while the allocated processing window will stay the same. The additional complication is that delivery timeline is within 8 weeks, and with minimum allocated development resource.
With the current volume, the system would just make the SLA. Any additional load will completely push the processing time past the agreed 8.30AM SLA.
Facts:
- data feed is delivered between 5AM and 6.30AM
- currently, data is supplied on position level, soon to be changed to trade level, resulting in 10x increase in volume
- risk reporting must be available to users by 8.30AM
Processing stages:
- data loading
- enrichment/validation/mapping
- aggregation to position level
- report generation
We have created a mock feed with 1.5 million trades and have run a stress test against the current system.
- data loading took 30 min
- enrichment/validation/mapping took ~15hours (it didn’t finish, as the system crashed toward the end of the processing, but it gave us enough of an idea of performance ..)
- agggregation to position level took 1.5 hours
- report generation - abandoned..
So, looks like we have our work cut out. How do you fit in a 17 hour process into a 2-3 hour slot in 8 weeks.
Review:
- server resource should not be an issue. We are running on a 32CPU solaris cluster with 64GB of RAM
- data model does not really follow any particular school .. staging tables are particularly wide (up to 150 column wide)
- large tables are partitioned by feed (range partitioning)
- Enrichment/validation/mapping stage consist of 76 separate steps
- most heavy processing are done inside database via stored procedures
With the tight schedule imposed on this, data remodelling would be out of the question. There is no way that could be achieved within 8 weeks. Here is my current thoughts:
- implement subpartitioning to allow parallel DML to be done for most processing and reporting. This seems to be the biggest/quickest win we can get, since we are not constraint by hardware resource.
- I am a little bit “disappointed” that most of the processing is already done in the database. A lot of other systems that I work on would have a lot of batch processing done outside the database, and bringing processing inside the database would normally create biggest wins… This means we need to dig deep into the stored procedures to see what can be done there.
To be continued …