As I was working on some new numbers for the FusionIO card I have been testing a friend that works there asked if I would join him on a webinar about BI and FusionIO. Since I have a Fusion card in my testing BI server I thought this would be a great opportunity to talk about some numbers I’ve gotten off of it.
Since this is a testing server right now for BI I’m able to test various ETL configurations against the server. For these tests I was running a 320GB FusionIO card(MLC Details here on there site). I decided I would pull over a Dailyreports table for my staging data. The table has 66 million rows in it (no it’s not one day worth). The table has 9 facts and 5 dimensions including a time dimension. I created a procedure to make my fact table and populate the dimensions based off the distinct list of items in the original table. Then I created a separate process to populate the fact table by looking up the dimension keys and populating the measures off the business key. Basically the same thing you would do for any typical ETL process. I ran this with the DB on our DMX 1000 what we consider (Tier 1 storage). The DMX 1000 is configured as Raid 10. I’m not using it exclusively so there are other things running on it. I’ll be adding the SQLIO profile for the DMX to my previous Excel sheet that I posted previously That file is located on google docs here is a link. I consider this a typical san situation that many BI professionals will find be in. They usually don’t have the slowest disk but they also don’t get the fastest disk possible. I ran the Process on the DMX then moved the dB to the Fusion and ran it again here are the results of the numbers.
||Sum of Time(Sec)
||Sum of Time(Sec)
|Create Business Key
|Populate Acct Dimension
|Populate Bank Dimension
|Populate Group Name
|Populate time Dimension
The lesson that we can conclude from these results is even if you don’t have SSD even if you just have fast disks you can increase your ETL processes by placing your staging data and lookup data on the fastest disks possible. My suggestion is that when you’re spec’ing out your DW servers don’t forget your disk subsystem. DW’s usually require lots of space. Most of that space doesn’t have to be on very fast disks. But you should try and keep your ETL process on fast disk when possible.
In my experience with many different BI situations DW is the space where the BI professionals usually have some say in what hardware they want to use and they have control over it. Staging databases typically do not need to have huge disaster recovery plans with them as they are frequently re-created by the source data coming in. This makes it an ideal situation for Fusion type drives/SSD and fast disks.
I did some backup tests as well with the card and will get those posted as soon as some tests finish running. 🙂
I could list lots of excuses as to why it’s been so long since I blogged but I’d rather get to the blog post so let’s just say I’m lazy and working on getting better. 🙂
I’ve been doing work with SSIS and running it at a command prompt so I could use it to load test a Database. My hope is later to move this into a simple application that calls the procedures I want but until I start learning C# for now I’ll stick with SSIS.
The problem I ran into was passing parameters. I have a loop inside my package and I want it to run X amount of times for my testing needs. That’s not a problem on setting up the loop you can find that here. But how do you pass the variable to the package so that you can tell it how many loops you want? Well for that here is another blog that talks about it. Ok so now that I’ve linked off most this post to others what the hell am I trying to add to all this?
Well I read those blogs and it worked until I started adding multiple set options and I ran into some simple syntax issues. I also noticed there was few examples out there of the full syntax of what it looks like. Call me crazy but this always screws me up when I’m trying to get syntax right with SSIS. So here are my scripts that I use currently to execute my package.
DTExec.exe /Rep N
Now as you might guess in my package I have 3 variables called iCounterEnd,RunDataLoad and InitialLoadEnd. For all you SQL people out there that are not use to coding environments they are case sensitive(yes this has been hard for me as well). Also notice that RunDataLoad is a –1. The variable is a Boolean which I would have assumed would say “true” or “False” but I was wrong. It is –1(True) or 0(false) for the setting.
Now a great way to get examples of what should and should not be is to use Package Configurations. If you enable package configurations and write an xml file down to some directory then you can get the variable path/name that you need and the value it’s currently set to. Here’s a screenshot of what the package config file will look like.
The Path = is what you need above to make sure you have the correct path to the variable.
Hopefully this will help others to take a little less time to get a package to run from the command line.
I think I’m writing this as much for my sake as for others. Since I continually forget this setting here is how you pass in Parameters into SP’s your calling in your Execute SQL tasks’s in SSIS. In particular for this one I’m passing date’s which is common.
When you have a Execute SQL Task to set the paratemer your passing in as a Variable in SSIS here’s the Steps i’ve been using.
Create the Variable in SSIS, SSIS>Variables>NewIcon> Give it a name and type it should be scoped most likely as the whole package. That does depend on your complexity of the package.
Add your Execute SQL Task> When entering your Stored Procedure use Question marks for the parameters seperated by a comma. Then hit the parameters button and in the parameter name use each variable name from the stored procedure in the same order as you have Them in the stored procedure. Choose the Variable that it corresponds to on the other side. So you might see this.
From here hit Ok and you should have your parameters set to the proper information.
Hope that helps!