As I was working on some new numbers for the FusionIO card I have been testing a friend that works there asked if I would join him on a webinar about BI and FusionIO. Since I have a Fusion card in my testing BI server I thought this would be a great opportunity to talk about some numbers I’ve gotten off of it.
Since this is a testing server right now for BI I’m able to test various ETL configurations against the server. For these tests I was running a 320GB FusionIO card(MLC Details here on there site). I decided I would pull over a Dailyreports table for my staging data. The table has 66 million rows in it (no it’s not one day worth). The table has 9 facts and 5 dimensions including a time dimension. I created a procedure to make my fact table and populate the dimensions based off the distinct list of items in the original table. Then I created a separate process to populate the fact table by looking up the dimension keys and populating the measures off the business key. Basically the same thing you would do for any typical ETL process. I ran this with the DB on our DMX 1000 what we consider (Tier 1 storage). The DMX 1000 is configured as Raid 10. I’m not using it exclusively so there are other things running on it. I’ll be adding the SQLIO profile for the DMX to my previous Excel sheet that I posted previously That file is located on google docs here is a link. I consider this a typical san situation that many BI professionals will find be in. They usually don’t have the slowest disk but they also don’t get the fastest disk possible. I ran the Process on the DMX then moved the dB to the Fusion and ran it again here are the results of the numbers.
|DW Procedure||Sum of Time(Sec)||Sum of Time(Sec)||Percent Improvement|
|Create Business Key||303||291||-3.96%|
|Populate Acct Dimension||12644||4244||-66.43%|
|Populate Bank Dimension||794||226||-71.54%|
|Populate Group Name||2044||669||-67.27%|
|Populate time Dimension||1166||760||-34.82%|
The lesson that we can conclude from these results is even if you don’t have SSD even if you just have fast disks you can increase your ETL processes by placing your staging data and lookup data on the fastest disks possible. My suggestion is that when you’re spec’ing out your DW servers don’t forget your disk subsystem. DW’s usually require lots of space. Most of that space doesn’t have to be on very fast disks. But you should try and keep your ETL process on fast disk when possible.
In my experience with many different BI situations DW is the space where the BI professionals usually have some say in what hardware they want to use and they have control over it. Staging databases typically do not need to have huge disaster recovery plans with them as they are frequently re-created by the source data coming in. This makes it an ideal situation for Fusion type drives/SSD and fast disks.
I did some backup tests as well with the card and will get those posted as soon as some tests finish running. 🙂