Category Archives: Disk Performance

New Testing numbers with FusionIO

As I was working on some new numbers for the FusionIO card I have been testing a friend that works there asked if I would join him on a webinar about BI and FusionIO. Since I have a Fusion card in my testing BI server I thought this would be a great opportunity to talk about some numbers I’ve gotten off of it.

Since this is a testing server right now for BI I’m able to test various ETL configurations against the server. For these tests I was running a 320GB FusionIO card(MLC Details here on there site). I decided I would pull over a Dailyreports table for my staging data. The table has 66 million rows in it (no it’s not one day worth). The table has 9 facts and 5 dimensions including a time dimension. I created a procedure to make my fact table and populate the dimensions based off the distinct list of items in the original table. Then I created a separate process to populate the fact table by looking up the dimension keys and populating the measures off the business key. Basically the same thing you would do for any typical ETL process. I ran this with the DB on our DMX 1000 what we consider (Tier 1 storage). The DMX 1000 is configured as Raid 10. I’m not using it exclusively so there are other things running on it. I’ll be adding the SQLIO profile for the DMX to my previous Excel sheet that I posted previously That file is located on google docs here is a link. I consider this a typical san situation that many BI professionals will find be in. They usually don’t have the slowest disk but they also don’t get the fastest disk possible. I ran the Process on the DMX then moved the dB to the Fusion and ran it again here are the results of the numbers.

Drive Type
SanDMX1000 FusionDrive
DW Procedure Sum of Time(Sec) Sum of Time(Sec) Percent Improvement
Create Business Key 303 291 -3.96%
Populate Acct Dimension 12644 4244 -66.43%
Populate Bank Dimension 794 226 -71.54%
Populate Group Name 2044 669 -67.27%
Populate Measures 1318 878 -33.38%
Populate time Dimension 1166 760 -34.82%
Populate TradeServer 962 645 -32.95%

The lesson that we can conclude from these results is even if you don’t have SSD even if you just have fast disks you can increase your ETL processes by placing your staging data and lookup data on the fastest disks possible. My suggestion is that when you’re spec’ing out your DW servers don’t forget your disk subsystem. DW’s usually require lots of space. Most of that space doesn’t have to be on very fast disks. But you should try and keep your ETL process on fast disk when possible.

In my experience with many different BI situations DW is the space where the BI professionals usually have some say in what hardware they want to use and they have control over it. Staging databases typically do not need to have huge disaster recovery plans with them as they are frequently re-created by the source data coming in. This makes it an ideal situation for Fusion type drives/SSD and fast disks.

I did some backup tests as well with the card and will get those posted as soon as some tests finish running.  🙂

Advertisements

Reads and Writes per DB, Using DMV’s

So years ago when I was working with a SQL 2000 database I had a need to see how many reads and writes were happening on each DB so I could properly partition a new SAN we had purchased. The SAN guys wanted Reads/Sec and Writes/Sec and IO’s/Sec total. Perfmon couldn’t give this for each DB so I had to use fn_virtualFileStats. I wrote a procedure that would tell me per DB what was going on and then store it down into a table for later comparison.

I’ve found a need for this again since I’m running some load tests and want to know what my data files are doing. This is easier now thanks to sys.dm_io_virtual_file_stats. It’s even better now that so many people in the community provide great content! Glen Alan Berry (Blog | Twitter) wrote and excellent DMV a day blog post series and in one of them he gives a great query on Virtual File stats. This gives you an excellent general point in time look at your server but was not exactly what I needed. I wanted to know for a specific period what were my reads and writes. This works better for my load testing needs to know just what is going on during a specific time.

You might want to use this during heavy business hours as well. The DMV is keeping track since the last Server restart so if you look at just the DMV you’re going to see everything that has occurred. Maintenance items (checkdb, Indexes, Backups) will all show up inside there so one DB might be very large in size and at night when the backup kicks off it might be using lots of Reads making it’s percent much higher but during the day no one really uses the DB so the DMV might show you that the DB is very busy when in reality your other DB’s are the busy one’s during the day. So If I’m planning for a new SAN or moving around files I would run this query at specific times so I could compare data points and find out what my databases are actually doing during critical times.

This script takes a Baseline record on the DMV. It then waits the amount of time you specify and takes a comparison line. It then compares the two and returns the percent’s. Lots more could be done with the comparison and certain things I’ve left in for later changes like Size. Right now I don’t do anything with Size but I plan to show the growth from the data points. I chose not to create the script as a stored proc just so it’s easy for you to put where you want. It only deals with 2 data points to keep it simple for now I have considered changing this in the future. The only setting you need to change is the @DelaySeconds parameter. Just set that to the number of seconds you want it to wait. 60 seconds is usually a good default to get a quick snapshot on the server. My suggestion to discover what your databases are doing during busy times I would probably run it for about 300 seconds (5 minutes) and then do that 2-3 times in an hour to see what the data looks like.

Would love to hear any comments on if this works for you!  Thanks.

/*********************************************************************************************
File Stats Per DB
Date Created: 07-22-2010
Written by Pat Wright(@SqlAsylum)
Email: SqlAsylum@gmail.com
Blog: http://Www.Sqlasylum.com
This Scrpit is Free to download for Personal,Educational, and Internal Corporate Purposes,
provided that this main header is kept along with the script.  Sale of this script is
Prohibited in whole or in part is prohibited without the author’s consent.
*********************************************************************************************/

CREATE TABLE #FileStatsPerDb
(
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[databaseName] [NVARCHAR](128) NULL,
[FileType] [NVARCHAR](60) NULL,
[physical_name] [NVARCHAR](260) NOT NULL,
[DriveLetter] VARCHAR(5) NULL,
[READS] [BIGINT] NOT NULL,
[BytesRead] [BIGINT] NOT NULL,
[Writes] [BIGINT] NOT NULL,
[BytesWritten] [BIGINT] NOT NULL,
[SIZE] [BIGINT] NOT NULL,
[InsertDate] [DATETIME] NOT NULL DEFAULT GETDATE()
)
ON [PRIMARY]

DECLARE @Counter TINYINT
DECLARE @DelaySeconds INT
DECLARE
@TestTime DATETIME

–Set Parameters
/*
The counter is just to initialize the number to 1
The Delayseconds Tells SQL Server how long to wait before it runs the second data point.
How long you want this depends on what your needs are.  If I have a load test running for
5 minutes and I want to know what the read and Write percents were during those 5 minutes
I set it to 300. If I just want a quick look at the system I’ll usually set it to 60 seconds,
To give me a one minute view.  This depends on if it’s a busy time and what’s going on during that time.
*/
SET @Counter = 1
SET @DelaySeconds = 60
SET @TestTime = DATEADD(SS,@delayseconds,GETDATE())

WHILE @Counter <=2
BEGIN
INSERT INTO
#FileStatsPerDb (DatabaseName,FileType,Physical_Name,DriveLetter,READS,BytesRead,Writes,BytesWritten,SIZE)
SELECT
DB_NAME(mf.database_id) AS DatabaseName
,Mf.Type_desc AS FileType
,Mf.Physical_name AS Physical_Name
,LEFT(Mf.Physical_name,1) AS Driveletter
,num_of_reads AS READS
,num_of_bytes_read AS BytesRead
,num_of_writes AS Writes
,num_of_bytes_written AS BytesWritten
,size_on_disk_bytes AS SIZE
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
JOIN sys.master_files AS mf ON mf.database_id = fs.database_id
AND mf.FILE_ID = fs.FILE_ID
IF @Counter = 1
BEGIN
WAITFOR
TIME @TestTime
END
SET
@Counter = @Counter + 1
END

;
WITH FileStatCTE (Databasename,Filetype,Driveletter,TotalReads,TotalWrites,TotalSize,TotalBytesRead,TotalBytesWritten)
AS
(SELECT BL.Databasename,BL.FileType,Bl.DriveLetter,
NULLIF(SUM(cp.Readsbl.Reads),0) AS TotalReads,
NULLIF(SUM(cp.Writesbl.Writes),0) AS TotalWrites,
NULLIF(((SUM(cp.Sizebl.Size))/1024),0) AS TotalSize,
NULLIF(((SUM(cp.BytesReadbl.BytesRead))/1024),0) AS TotalKiloBytesRead,
NULLIF(((SUM(cp.BytesWrittenbl.BytesWritten))/1024),0) AS TotalKiloBytesWritten
FROM
( SELECT insertdate,Databasename,FileType,DriveLetter,READS,BytesRead,Writes,BytesWritten,SIZE
FROM #FileStatsPerDb
WHERE InsertDate IN (SELECT MIN(InsertDate) FROM #FileStatsPerDb) ) AS BL –Baseline
JOIN
( SELECT insertdate,Databasename,FileType,DriveLetter,READS,BytesRead,Writes,BytesWritten,SIZE
FROM #FileStatsPerDb
WHERE InsertDate IN (SELECT MAX(InsertDate) FROM #FileStatsPerDb) ) AS CP — Comparison
ON BL.Databasename = cp.Databasename
AND bl.filetype = cp.filetype
AND bl.DriveLetter = cp.DriveLetter
GROUP BY BL.databasename,BL.filetype,Bl.driveletter)

/*
Return the Read and write percent for Each DB and file.  Order by ReadPercent
*/
SELECT databasename,filetype,driveletter,
100. * TotalReads / SUM(TotalReads) OVER() AS ReadPercent,
100. * TotalWrites / SUM(TotalWrites) OVER() AS WritePercent
FROM FileStatCTE
ORDER BY ReadPercent DESC,WritePercent DESC

In my haste I forgot to add a sample of what you would get from this.  The names of the DB’s have been changed.

readpercentsample