I did a webinar on Tuesday with Fusion IO about using Fusion cards in a Data warehousing server. I also posted this blog post on Tuesday related to the numbers. I had a few question on specific stats on the DMX I was testing on so wanted to give an update today on that.
The DMX had 120 SAS 15K spin drives. They were set in a Raid 10 configuration giving 60 Effective disks. They were in use by other systems while running my IO tests and running the DW tests that I quoted on the blog post. As I mentioned in the post for me this is a typical scenario. I don’t have a dedicated SAN for my DW so these numbers work for me. Whether these are the numbers you will see in your testing depends on your environment.
I also promised some additional numbers about some other Fusion Raid sets I was testing and I’m getting a post ready for next week on those numbers. For now I would like to point out an excellent post by the SQLCat team on what it saw with Fusion IO and datawarehousing. Well worth a read here is the link.
SQL Cat Team Distinct Count and Xbox Live
Thom Larock aka SQL Rockstar (blog | twitter) Just posted yesterday about his first pass summit back in 2004. This was my first summit as well and when I had the privilege to meet Thom and Allen Kinsel (blog | twitter). You can read the blog post here for the story.
This gave me the idea to expand on one of the common tips I give to anyone that’s going to the summit. You should go online and create your schedule for the summit and choose what sessions you are planning to attend. I suggest you do this to get familiar with the schedule and to give yourself some must see sessions. I also suggest you leave time slots open then when you’re at the summit in the morning or at lunch you should ask what others plan to attend and if they have heard from the speakers before. I’ve made many changes to my session plans based on the advice of others.
I remember a specific situation at the very summit that Thom wrote about. It was a presentation about Security. Not sure the exact topic, both Allen and Thom were very interested in it. I actually chose to stay my course with another session. About 15 minutes into the session Allen and Thom showed up in the session I was in as the security topic did not go as they had hoped. Another session that Thom and Allen had an influence on was the volunteer session Thom talks about in his blog. Having already worked in my local PASS chapter before the summit I knew some of the things I needed to volunteer so I wasn’t planning on attending Wade’s session for the pass volunteers. It was Allen and Thom talking about it that made the difference for me to attend. It was a good decision.
Plan some couch time. This may sound silly but there are times at the pass summit that is best to hang out and socialize with others out in the hall. Networking with others is going to give you contacts and relationships that will last for many years and help you later on in your career. So if you notice some people hanging around the couches chatting introduce yourself and ask if you can join in.
You will find some of the best suggestions you end up attending may not have been your first choice. Feel free to find me at the summit I would be happy to help you choose sessions and find what works best for you.
This evening a group of friends will get together to “Stuff” bags. This same exercise has been done all over the world for SQL Saturdays, Code Camps, Bar Camps and events of all kinds. Even though these people will miss time with family and possibly kids events/games they do this without complaint and without pay. So why do these Crazy people do these things? They all share a desire to see their fellow professional succeed. By putting on events we give people the opportunity to learn and socialize with other members of the community. Our goal is to make you better professionals not only for your current employer but all your future employers.
If you would like to help us reach this goal, come and join us for some good education and social networking at this weekend’s Code Camp. I guarantee you will learn something and I guarantee you will meet people that would love to help you with your career. The more people we get at the event the more chance you will have to network with other professionals facing similar problems that you are facing.
Utah Code Camp (www.utahcodecamp.com )
Date: Sep 25, 2010
Time: 9:00am – 5:00pm
Location: Neumont University
10701 South Jordan River Parkway
South Jordan, UT, 84095
What other time are you going to have a chance to learn from some of the best speakers around and a chance to win an IPAD, IPOD, Xbox360 and thousands of dollars in software all at one event!
A big thank you to my fellow organizers of the Utah Code Camp and all of you that give time to your community and your fellow professional to make them better at what they do.
As I was working on some new numbers for the FusionIO card I have been testing a friend that works there asked if I would join him on a webinar about BI and FusionIO. Since I have a Fusion card in my testing BI server I thought this would be a great opportunity to talk about some numbers I’ve gotten off of it.
Since this is a testing server right now for BI I’m able to test various ETL configurations against the server. For these tests I was running a 320GB FusionIO card(MLC Details here on there site). I decided I would pull over a Dailyreports table for my staging data. The table has 66 million rows in it (no it’s not one day worth). The table has 9 facts and 5 dimensions including a time dimension. I created a procedure to make my fact table and populate the dimensions based off the distinct list of items in the original table. Then I created a separate process to populate the fact table by looking up the dimension keys and populating the measures off the business key. Basically the same thing you would do for any typical ETL process. I ran this with the DB on our DMX 1000 what we consider (Tier 1 storage). The DMX 1000 is configured as Raid 10. I’m not using it exclusively so there are other things running on it. I’ll be adding the SQLIO profile for the DMX to my previous Excel sheet that I posted previously That file is located on google docs here is a link. I consider this a typical san situation that many BI professionals will find be in. They usually don’t have the slowest disk but they also don’t get the fastest disk possible. I ran the Process on the DMX then moved the dB to the Fusion and ran it again here are the results of the numbers.
||Sum of Time(Sec)
||Sum of Time(Sec)
|Create Business Key
|Populate Acct Dimension
|Populate Bank Dimension
|Populate Group Name
|Populate time Dimension
The lesson that we can conclude from these results is even if you don’t have SSD even if you just have fast disks you can increase your ETL processes by placing your staging data and lookup data on the fastest disks possible. My suggestion is that when you’re spec’ing out your DW servers don’t forget your disk subsystem. DW’s usually require lots of space. Most of that space doesn’t have to be on very fast disks. But you should try and keep your ETL process on fast disk when possible.
In my experience with many different BI situations DW is the space where the BI professionals usually have some say in what hardware they want to use and they have control over it. Staging databases typically do not need to have huge disaster recovery plans with them as they are frequently re-created by the source data coming in. This makes it an ideal situation for Fusion type drives/SSD and fast disks.
I did some backup tests as well with the card and will get those posted as soon as some tests finish running. 🙂
So I had planned a post to talk about who I was going to vote for and outlining some of the reasons. Unfortunately a failure to plan is a plan for failure! A customer who shall remain nameless decided to lose a LUN on their SAN on Monday. No I don’t know exactly how you accomplish this but I most likely will have a blog post about what you should do if it does happen in the future(Long nights are in your future if this happens) . So this took away time from me getting a blog post up. Instead I give you this advice if you have not voted already (I’m sure many have)
1. Research! The forums located here (Login Required). Have been great to talk to the candidates and get lots of various questions answered.
2. Talk to the candidates. If you have a specific question you want to ask them get a hold of them. They all are reachable by email.
3. Read through the #passvotes hashtag. While much of this has been discussions on the process it’s also the candidates talking in there as well.
The point here is to make an informed decision when casting your vote. Personally I have yet to vote and honestly I have not decided the three I will vote for as I have not had the time to read through all the forum posts and questions. Although I know many of the candidates well I know they put a lot of time and effort into answering questions and I want to respect them by giving them the time and effort they gave me. I have a feeling it will take me the weekend to get through all the candidates.
The other point I want to make about the elections is once again about process. Tons of discussion has taken place on the process. Whether good bad or indifferent if you have written a blog post, added a comment on a blog, or discussed in the forums please make sure to let PASS HQ know. Just because it’s out on your blog or in some comment does not guarantee that they will see it. If you really want to make the process better for next year then you need to send in your comments/suggestions to PASS HQ directly so Johannes can get the information he needs to make the process better. If you need to know where to send that information Johannes outlines it very well in this post (Login Required).
Finally a HUGE thank you to all of the community that has spoken up about the election/process/candidates, you are what make the SQL community great.