Data warehousing has become an important technology even for small to mid-size companies for data analysis. One of the issues we encounter with data warehouses is performance, since we combine a large amount of data from multiple data sources. There are several steps we can take to maximize data warehouse performance:
- Separate server for BI360 data warehouse: Your ERP server has a lot of activities going on constantly, so it is recommended that you have a separate server for BI360, this way you have a dedicated server which can produce a better
- Limit the amount of data in the data warehouse for the last say 3-4 years. Just because you have 10 years’ worth of transaction data in your ERP system does not mean that you have to bring in all of them into your data warehouse. You rarely need to do any reporting using such old data. Part of the benefit of having data warehouse is to have only the clean data you need, it’s called data warehouse, not data dumpster.
- Maintain database statistics: See below
- Indexes: See Below
Very often we hear from our customers that reports that used to run very quickly now take a lot longer. And they claim that there has been no change in their system. How can this happen?
One of the reasons is that SQL server is not processing all the commands in the most efficient way. When you run a report, your reporting tool sends a set of SQL commands to SQL server and process them in the order they are received. Just like there are many different solutions to the same math problem, there are different ways for SQL Server to process the same commands, and some are more efficient than others. How does SQL server determine how to process them? They make decisions based on database statistics, which is not updated automatically by default. Attached document shows step by step process of how to create and schedule a database maintenance plan. This should be a standard practice for all the databases, not just data warehouses.
Data Warehouse performance and maintenance – Download Document
If you have millions of rows of records in one fact table, it might take a long time to generate a report. One of the things you can do is to create indexes. Below are two reasons why indexes help speed up report generation time.
- One reason is that generally fact tables hold many columns that are not useful in reporting. In case of BI360 data warehouse, columns such as createdBy, CreatedTime can be useful to see by who and when certain data were added, but it’s rarely required in any of financial reports. But they still take up some space in the hard disk. When you create an index, basically you are creating a copy of the table, only with the data fields you need for your reports. So the index takes up less space than the original table, it’s quicker for SQL server to go through all the records.
- Another reason is that if a table doesn’t have any index, then SQL server has no choice but to performance what’s called “Table scan”, which is to go through every single row of data, one by one. This is a time consuming process for a large table. When you create an index, you are sorting data based on columns you choose. If you create an index sorted by say timeperiod and Account, then SQL does not have to go through the entire table. This is just like when you look for specific data in an Excel file. With unsorted data in an Excel, you have to go through every single row, but if you sort the same data, then you don’t need to do that.
Below is a sample SQL script that will create an index on one of the fact tables, f_Trans_GL. You can add or delete dimensions and also add comments or UDF fields if necessary.