Category Archives: Datawarehouse

DataWarehouse / Business Intelligence Reporting tool needs

I mentioned in a quick note earlier this week that I had been absent from blogging for a while.  In a previous blog post I mentioned that I accepted a new role at the beginning of the year (post), besides the new management duties I’ve taken on I’ve also taken on a role of getting a DW/BI project up and running.  I’ve been very focused on this project trying to meet some key deadlines for April; the project along with a very successful Utah Code Camp has kept me very busy (more posts on that in the future).

Since my focus has been on BI/DW right now I thought I would start off some of the key things we are looking for in a BI tool.  Hopefully this will help others get started with their own lists.  This is not meant to be every single possible need and it’s not meant for every environment, this was the key elements for our business.  Most likely you will have differing needs.

We are looking for a front end reporting suite that gives our end users the ability to answer questions about data on their own without help from the DW team.  The tool(s) that we choose will be primarily reporting off of a SQL Server 2008 database.  It needs to work across multiple browsers (Safari, Firefox, IE, and Chrome).  Here are the high level requirements we are looking for.

1.       Ability to do Ad-Hoc Analysis of data

2.       Ability to report off Both Cube/OLAP data

3.       Ability to report off OLTP Detail data (Create detail reports).

4.       Scheduled reports ability. Ability to refresh data on a schedule for reports.

5.       Report Delivery. Ability to push reports out via Email (We later removed this need but I leave it here since many still want this ability).

6.       Works in a browser.

a.       Safari

b.      Firefox

c.       IE

d.      Chrome

7.       Website for centralized report management

8.       Mobile Client/Mobile capability available.

I would love to hear if you have additional items you think are required/necessary in a tool.

 

 

A History of Business Intelligence

A short post for a Friday!

I’m finishing up my first 2 weeks at my new job and new position.  I don’t think I’ve been in SSMS for more than 3 hours in the last 2 weeks.    This is a big change for me I’ve spent the better part of 12 years in the trenches with SQL server either Administering, Architecting or querying SQL Server.  I’ve spent much of my time reviewing Business Intelligence tools that are out on the market.  I’m working on some blog posts on things to look for and some of my impressions of these tools but until those are complete enjoy this short post and this very well done You tube video about Business Intelligence.  It’s not slanted any one way in my opinion and it doesn’t feel like its’ trying to sell a specific product.  Take 10 minutes when you have time and check it out.

http://www.youtube.com/watch?v=_1y5jBESLPE&feature=related

Fusion IO DW stats update

 

I did a webinar on Tuesday with Fusion IO about using Fusion cards in a Data warehousing server. I also posted this blog post on Tuesday related to the numbers. I had a few question on specific stats on the DMX I was testing on so wanted to give an update today on that.

The DMX had 120 SAS 15K spin drives. They were set in a Raid 10 configuration giving 60 Effective disks. They were in use by other systems while running my IO tests and running the DW tests that I quoted on the blog post. As I mentioned in the post for me this is a typical scenario. I don’t have a dedicated SAN for my DW so these numbers work for me. Whether these are the numbers you will see in your testing depends on your environment.

I also promised some additional numbers about some other Fusion Raid sets I was testing and I’m getting a post ready for next week on those numbers. For now I would like to point out an excellent post by the SQLCat team on what it saw with Fusion IO and datawarehousing. Well worth a read here is the link.

SQL Cat Team Distinct Count and Xbox Live

New Testing numbers with FusionIO

As I was working on some new numbers for the FusionIO card I have been testing a friend that works there asked if I would join him on a webinar about BI and FusionIO. Since I have a Fusion card in my testing BI server I thought this would be a great opportunity to talk about some numbers I’ve gotten off of it.

Since this is a testing server right now for BI I’m able to test various ETL configurations against the server. For these tests I was running a 320GB FusionIO card(MLC Details here on there site). I decided I would pull over a Dailyreports table for my staging data. The table has 66 million rows in it (no it’s not one day worth). The table has 9 facts and 5 dimensions including a time dimension. I created a procedure to make my fact table and populate the dimensions based off the distinct list of items in the original table. Then I created a separate process to populate the fact table by looking up the dimension keys and populating the measures off the business key. Basically the same thing you would do for any typical ETL process. I ran this with the DB on our DMX 1000 what we consider (Tier 1 storage). The DMX 1000 is configured as Raid 10. I’m not using it exclusively so there are other things running on it. I’ll be adding the SQLIO profile for the DMX to my previous Excel sheet that I posted previously That file is located on google docs here is a link. I consider this a typical san situation that many BI professionals will find be in. They usually don’t have the slowest disk but they also don’t get the fastest disk possible. I ran the Process on the DMX then moved the dB to the Fusion and ran it again here are the results of the numbers.

Drive Type
SanDMX1000 FusionDrive
DW Procedure Sum of Time(Sec) Sum of Time(Sec) Percent Improvement
Create Business Key 303 291 -3.96%
Populate Acct Dimension 12644 4244 -66.43%
Populate Bank Dimension 794 226 -71.54%
Populate Group Name 2044 669 -67.27%
Populate Measures 1318 878 -33.38%
Populate time Dimension 1166 760 -34.82%
Populate TradeServer 962 645 -32.95%

The lesson that we can conclude from these results is even if you don’t have SSD even if you just have fast disks you can increase your ETL processes by placing your staging data and lookup data on the fastest disks possible. My suggestion is that when you’re spec’ing out your DW servers don’t forget your disk subsystem. DW’s usually require lots of space. Most of that space doesn’t have to be on very fast disks. But you should try and keep your ETL process on fast disk when possible.

In my experience with many different BI situations DW is the space where the BI professionals usually have some say in what hardware they want to use and they have control over it. Staging databases typically do not need to have huge disaster recovery plans with them as they are frequently re-created by the source data coming in. This makes it an ideal situation for Fusion type drives/SSD and fast disks.

I did some backup tests as well with the card and will get those posted as soon as some tests finish running.  🙂

“The Cube” Summary of Peter Myer’s User group presentation.

Peter Myers is in SLC this week to do some training. He was kind enough to approach the Local SQL PASS user group and offer to do a presentation. I helped out by organizing the location at New Horizons. This would be a good time to check for your Local SQL PASS Chapter if you don’t know where it is. Here’s a list!

Peter’s Presentation was “Developing a UDM with Analysis Services” UDM = Unified Dimension Model. This is a new name MS has given for Cubes in 2008. So don’t let the name fool you UDM =Cubes. Peter did a great overall presentation on Cubes showing us how to create them and enriching them with more features. Here’s some of the key points.

Reasons for Moving to OLAP Structures instead of OLTP

Performance,

Multiple Disparate Systems of data can be combined to one,

Interpretation /One Truth of the data can be controlled in one location.

DW is a continuing process never ending. Answering questions will always just yield more questions.

We all know that the more questions you answer the more questions the business has for you. So when someone says to you that the DW process will be done in 2-3 years it really won’t it will keep evolving. Much of the design and work to load the DW could be complete but more questions will keep coming.

Data Source Views

Allow for the abstraction of your data layer. It’s good to build the foundation properly and setup your DW database in the fact/dimension table schema that you have designed. When that’s not always possible or your given read only access to the data and you can’t change the underlying schema then you need to make those changes in Data source views. Here’s some key items you can do.

Create columns based on other columns.

Create tables based off queries and filter tables based off a query.

Create friendly names to make it easier to use in the cube.

Dimensions

Peter suggested creating your dimensions before allowing the cube wizard to create your dimensions for you. Typically I create my dimension tables in the underlying data structure and then allow the wizard to create the dimensions for me. I agree with Peter though that this will allow you to be more in control by creating the dimensions first and modifying them for your need and then just telling the wizard where to go to get the data. I like this approach much better and plan to move to it in the future.

It’s a good idea to change your Key columns and your Name columns in your dimensions to the data that you actually want. The wizard does not always get this right and should be something that you get the information you are looking for.

Limit your dimensions and don’t show the columns the end user doesn’t need.

Create hierarchies to help the end user navigate better.

SSAS 2008 incorporates best practices into it so if you see a blue squiggly line it’s making a suggestion that it wants’ you to change something to meet best practices.

Cubes

Peter really showed us the building blocks to building the Cube so when it came to actually building the Cube it took very little time and worked very well. I’ve been a firm believer in building your foundation first for your cubes so huge props to Peter for doing an excellent job in explaining it this way for everyone.

Excel

I’ve used pivot tables in excel for many years so most of this was review for me but as always you should keep learning new things and I once again learned something new here. I learned that about the formula for =CubeMember() in excel. I was not aware of this before allowing you to place cube data just about anywhere in excel. This will be great for some specific reports that we have that are manually updated right now. I can place a sheet on the work book that has the cube data and just hide the sheet and reference it with this throughout the report that they want. Great added functionality.

Overall Peter not only knew his stuff but was an Excellent presenter. I really have a hard decision to make now on what pre-con I want to attend. 🙂

I’m planning some follow up blog posts on a few other key items he mentioned. Attribute hierarchies, properties of dimensions and more on using excel and CubeMember()

Thanks again Peter!