Andrew Copeland - Consultant
One the my recent challenges has been to design an SSIS application to read the massive amount of information provided by PowerPlay Enterprise Server audit files.
These file are created when you enable auditing in PPES Admin and provide detailed analysis of user and cube activities. It can prove to be invaluable when deciding if a cube is viable or underutilised.
IBM Cognos provides a command line utility which takes the log information and transforms it into a DataMart. However, during testing we found that it was taking up to one day to process 100mb of data and therefore not viable for us to use.
I undertook a short project to deliver an SSIS application that imported the entire range of log files and parsed the information into dimension which will provide my client with insight into the usage and load placed on their BI environment. We had particular interest in three things
Number of requests by day for future load calculations.
Number of requests by cube to look for obsolete entities.
User analysis by last access date to check for user licence utilisation.
The SSIS Application
We were able to quickly construct an SSIS application. The application consists of two sections, the importing of the raw log files into the staging environment and then the processing of the data into the dimensions and fact tables.
The importing into the staging environment used tried and trusted SSIS techniques. These involved a For Loop Container, a Data Flow Task and a File System Task to move the completed logs into an archive or processed directory.
The Control Flow looks like this:
Once the bulk data has been loaded into the staging table the table looks like this:
The challenge now was to turn the information in the vast number of log line entries into meaningful intelligence. The lines that I parsed through to the dimensions is the Sub Types of ‘REQINFO’ and ‘USR’. The request information gave me cube and dimensional analysis with the user line giving me the user analysis.
During the creation of the dimension and fact table SSIS routines I needed to parse one of the fields called Line_Details as it was comma delimited. I used some neat SSIS functionality in the Derived Column Transformation to parse the arguments. The function in question was called FINDSTRING and you use it as follows:
Therefore if you want to find the first occurrence of a comma in the field Cube_Details then you would use the command FINDSTRING(Cube_Details,",",1)
As I was looking for the argument BETWEEN the first comma and the second comma my derived expression was like this:
SUBSTRING(Cube_Details,FINDSTRING(Cube_Details,",",1) + 1,FINDSTRING(Cube_Details,",",2) – FINDSTRING(Cube_Details,",",1) – 1)
Finally, a bit of data cleansing was required to remove any double quotes and forward slash characters from my data by using the SSIS derived expression REPLACE as below:
Double quotes was a little harder to calculate but the command below works:
Therefore, now the SSIS packages are written and published for scheduled execution we will now move onto the writing some BI reports to consume the information. The final structure for Version 1 of the application is below.
By writing this application it has given me a greater understanding of the information held within the PPES audit logs. It has also enabled my client to have a detailed view of what is being processed on a daily basis. Their BI application is quite mature and is under continual development so being able to see who is using the tools and who is not is invaluable information.
By writing the application using SSIS I have been able to parse over 120 log files (12GB) in around 1.5 hours. With the existing out of the box PPES Audit collector this would literally have taken years and in all probability would never have finished.
The next project is to do a similar SSIS exercise parsing the cube build log files to look at duration, segmentation and the resources required from SQL Server to read the data. I will share the results once known.