So I’m finally getting back to blogging and figured I would get in on a post for #TSQL2sDay. I didn’t want to make something to terribly long so I thought I would talk about a common reporting scenario I run into all the time. What is meant by a vertical table and some ways to get data out of the table?
My definition of a Vertical table is one that has been created to hold lots of disparate data dynamically. I’ve seen some of the largest of these in a call center environment so I’ll use that as my example.
Let’s say you’re a Customer service agent. You take hundreds of calls a day and each call has specific data attached to the call for reporting purposes. Things like Talktime (Amount of time you were on the phone), Hold Time (The time you put the caller on hold) and TransferExtension (You transferred the call to another agent). All of these things in the system are known as attached data, basically data that is riding along with your call for later system/reporting needs. Now customers are typically allowed to add attached data to the call system. Perhaps the customer would like to not only track when you were on hold and when you were talking but they also want to know something about the person that called the system. Perhaps there CallerID information also known as ANI, this is another form of attached data that the customer may dynamically assign to the call that the system has to handle. This is why many systems employ the use of vertical tables.
Vertical tables allow for you to dynamically create new pieces of data about something. There is no need to add another column to a table it can automatically start making the data appear in the table without any changes. This is the great power of Vertical tables. Hopefully you will also see this is a great weakness. So here’s what the vertical table above would look like.
Create table Attached_Call_Data
(Id int identity(1,1) not null,
Callid int not null,
Keyname varchar(255) not null,
Value Varchar(2000) null)
The Callid – would link all information back to a specific call (many times you might have a sequence as well)
The Keyname – would tell you what piece of data you have such as (‘Talktime’,’Holdtime’,’ANI’)
The Value – this would be the value of the call. So perhaps Talktime = 90 Seconds the value here would be 90
Now once again hopefully you’re seeing the problems here. The value column is just a varchar so it can hold any data. This makes it very difficult to keep consistent with what’s in the column. It also means lots of conversions in your future. Where clauses have to convert the rows as well so get ready for lots of index scans.
Ok one last example and we’ll talk pro’s and cons. Here’s what the data could look like for example.
3,2882,ANI,John Smith 555-555-5555
This is a very simple example but a way to prove the point with vertical tables your going to get lots of varied data. Ok so how do we report off of them and what’s the pro’s and con’s?
1. Dynamic adding of columns to data sets without changes to code or DB design.
2. Makes for small tables as far as size go. You only have 3-4 columns in a vertical table so it’s not going to get nearly as large as a table with 100-200+ columns
3. Can be fast to search and index because of the narrow size.
1. Can’t control the data type allowing any variation of data to come in.
2. Reporting off the table is very cumbersome.
3. Too much processing/conversion to get the data out.
4. Many millions of rows for a small data set. If the customer takes 100,000 calls in one day that could be a potential of 100,000 * 100 pieces of attached data = 10,000,000 rows of data.
5. No lookup tables for the values so trying to get a list of what a specific keyname has is not possible unless someone has written another table that already has this information. Which I’ve yet to see a system designed this way.
6. Forces Scans frequently since so many conversions have to be done on the Value column.
Besides the obvious Data integrity and design issues with vertical tables the really big issue is reporting off of them. Most OLTP reporting is best done in a de-normalized flat table with lots of columns so you can get to data easily and with less joins. To get the data in that form for Vertical tables you don’t have a lot of good options. Here’s a few I have used.
(select value from Attached_Call_Data where callid = A.callid and keyname= ‘Talktime’) as Talktime
From Attached_call_data as a
This obviously has a lot of reads to get the data it needs.
Pivot is a method I have yet to use since it’s hard to get around the dynamic nature of the tables but it’s one I’m looking into for a future project. A friend used Pivot with great success and speed to get the data out and I’m eager to use it as well.
You could use a Loop/cursor to get through it. I can’t see this as a good way to do it unless you have some way to get the data out of SQL quickly and then loop it in memory. I have yet to ever try and use a loop to get data out.
I’m sure there are methods in C#/CLR that you could use to get through this as well.
I’m sure there are other method’s out there and I would love to hear about them from anyone that has to deal with Vertical tables. No matter which method you’re looking at though it’s going to be more processing than your standard OLTP/Normalized data structure. I’m a firm believer in every tool has a place though and I know and understand that the vertical table does. If you need a dynamic structure to hold your data and you can control validation on the app side and you can create a separate process that runs out of band for your reporting needs then this can be a very useful table. If you can’t meet these criteria you should stay away from this table, Real time reporting or real time reports written off vertical tables never work well without lots of hardware backing them.
I think this went a little longer than I hoped but hopefully you got some good information about Vertical tables if you have never had to report off these count yourself lucky.