| Executive Summary: Microsoft provides an end-to-end business intelligence (BI) solution based on its SQL Server data platform. Read about how Premier Bankcard deployed Microsoft's SQL Server 2005 BI platform, including how long it took to get the data warehouse up and running and the challenges faced during implementation. |
In the past few years, Microsoft has released several business intelligence (BI) products, such as SQL Server 2005, the Microsoft Office 2007 suite, Microsoft Office SharePoint Server (MOSS) 2007, and Microsoft Office PerformancePoint Server 2007. However, it wasn’t until recently that companies began to feel business pressure to think about how their bottom line is affected by the quality and timeliness of the information that their employees use to make decisions. Microsoft and its competitors offer BI products to fill this need, and Microsoft touts its ability to provide an end-to-end BI solution based on its SQL Server data platform. With SQL Server 2008’s enhanced BI features, many companies are considering implementing Microsoft’s BI platform to improve access to meaningful data. To gain insight into how organizations decide whether to deploy Microsoft’s BI solutions and to see how these solutions can be implemented, I spoke with Ron Van Zanten, the directing officer of BI at Premier Bankcard, to get an inside look at how he implemented Microsoft’s SQL Server 2005 BI platform in his organization, including how long it took to get the data warehouse up and running and the challenges that he faced during the implementation.
Bearly: What made your company decide to implement Microsoft’s BI solution?
Van Zanten: We were like a lot of companies are when they’re starting out. We had reporting, and individual divisions had systems on which they would keep their data. We did have a data warehouse at the time. It was good except for two flaws—the information wasn’t timely or accurate. We needed to break down the silos of information and get one version of the truth, so that everybody was working off the same set of books. That’s why we chose a centralized data repository.
Bearly: Did your company look at alternative BI platforms before deciding to use Microsoft’s?
Van Zanten: Yes, we looked at Oracle and Teradata. Teradata had only a limited number of installations, it’s extremely expensive, and they wanted to have their consultant come in and set up the data warehouse, set up the schema to one of the schemas they have for financial institutions, and then build reports off of it. The downside of that is that we wouldn’t have the ownership for the implementation. Oracle didn’t have a BI stack and told us to use Analysis Services for the cubes. So it looked like we were going to use the top part of the Microsoft stack anyway. Since Analysis Services was in all three proposals, we decided to use Microsoft’s BI platform. I didn’t have any reservations with SQL Server. I knew we could build a schema and leverage SQL Server’s strengths and make it perform like we needed it to. As long as we started from the ground up, built it and designed it with performance in mind, and put a presentation layer on the top of it, we were certain SQL Server would be able to perform.
Bearly: Which Microsoft BI products are you using?
Van Zanten: SQL Server 2005 SP2. We’re running the latest version of Reporting Services [2005 SP2], and Analysis Services has the latest service pack [2005 SP2] on it, too. We’re running PerformancePoint 1.0, which is the first build. We can’t wait for Performance-Point 2.0. We use SharePoint 2007 and we did use ProClarity 6.1, but we’re migrating over to PerformancePoint for that presentation layer.
Bearly: What features made you use those products?
Van Zanten: The partitioned tables were something that we really wanted, and we really wanted our database platform to be NUMA [Non-Uniform Memory Access] aware. NUMA is a hardware feature where we have four cells on our Itanium box, and those cells have four processors and 16GB of memory on each one. SQL Server 2005 is aware of that, so we don’t allow people to take more than one cell of resources when they’re running their queries. We have one fact table that’s 2TB; it’s got almost three billion records in it. You could write a query to our data warehouse and its hardware, and SQL Server keeps a query like that from overrunning everything else. The query will go through, and we’ll eventually find it and be able to stop it before it wastes everybody’s time. But in the meantime, the SQL Server instance is still available and still has a decent response time.
We monitor our data warehouse by response time. We run a query against our Account dimension; we query the dimension table, returning one record back. We track the time in milliseconds that it takes to log in, return the data, and log out. We do this every 10 minutes. I use that metric above all others because that really tells me how SQL Server feels about things. If it has CPU pressure, that query will be slower; if there’s a lot of memory pressure it’s slower; or if there’s disk pressure it’s slower. If the response time starts to degrade, then I can see where the bottleneck is because it’s usually one of those three things that the SQL Server system is unhappy about. Our standard is that a query needs to happen in 250 milliseconds on the SQL Server system.
Bearly: You mentioned that you have a centralized data repository. Can you tell me more about your data warehouse?
Van Zanten: We created a centralized data warehouse, where everything is stored and loaded through a standardized ETL [extraction, transformation, and loading] process. The ETL process is where the data is validated and cleaned. Our data warehouse is a relational star—it’s actually a snowflake schema—where the different fact tables and different core areas of our business can be related to each other correctly. We look at letters that are sent to the customer, phone calls the customer makes to us, and the problems that they have, in the same context as other things that are happening in our company, so that we can get a 360-degree view of the customer. But that really only works well when you have a centralized data store. It does cause new problems because you’re inviting new challenges. One of those challenges is size, by having everything in one area—but we have a 12TB SQL Server instance that runs this data store.
Bearly: Can you tell me more about that SQL Server instance?
Van Zanten: It’s 12TB, and it contains over 40 billion records. If you think about it in the big picture, it actually makes our data stores smaller because we aren’t keeping separate copies of data; everything is located in one spot. If you want to get monetary transactional history, such as payments, purchases, or fees, you can go to one place that’s been audited, balanced, and built on a repeatable process that’s gone through change control. You have a place where you can trust the information. Reports can then be built on the information, and our business users can build data marts and subject areas to drill down further and do further analyses, but it’s always based on the original version of the truth. It gives everybody a point of reference for reports, and analysts can drill back and verify such information as “Where’s the baseline? Do I balance? Is my report correct?”
Bearly: How difficult was it to implement
Microsoft’s BI platform?
Van Zanten: The most important thing was understanding the platform, in order to leverage it. We had to have the right information but it also had to be timely. ETL performance was definitely a consideration. Our ETL process runs through SSIS on a separate server, and we bulk INSERT into the actual database instance all of the data.
Having our data warehouse and SSIS on the same SQL Server instance probably wouldn’t have been the best scenario, because both of those products love memory, and we didn’t want to have that kind of contention. So having a dedicated ETL box was important for us, although it’s something that companies sometimes don’t do because they think that if they go with the same vendor, they can put it all on one box. But if we had purchased a separate ETL tool from a third-party vendor, we would have had it installed on a separate server anyway. So we took best practices for the industry and applied them to our stack.
We have a separate SSIS server whose dedicated job is to manage the data movement, the data quality, and the checks, which is a lot of work because we actually take in data and put it in a dirty table, and then we start to normalize it to get it ready to put into our data warehouse. We put the keys in there for our dimension tables on the fact tables, and we standardize data types. We’re looking for things that are out of the ordinary. We put 20 to 25 million records in the data warehouse per day, but we inhale into our ETL server well over 30 million. Then we get the data distilled down, cleaned, and standardized. The data warehouse then accepts this new information and the ETL box will kick off Reporting Services schedules.
Continue to part 2
Prev. page  
[1]
2
next page