Overcome Your Fear of Data
When we talk about data analytics and data mining, many people immediately get overwhelmed. They conjure images of IT experts hunched over a computer using some super complex software while typing code to analyze millions of rows of data. But that’s mostly from television. In the real world, most data analytics can be pretty simple and extremely beneficial. The greatest benefit is that testing the full population provides the best coverage. You can absolutely get great software to expand your capability, but most of us start out in Excel. Just think about the words we are using. “Data analytics” is just a systematic review of information, which is a fancy way to say you performed a test.
Before you let your fear of data take over, remember that you are already performing some level of data analytics. The goal is to get better, to advance your ability, and to slowly take on more advanced testing in a very systematic way. Most publications on this topic point to a simple 5 step process.
Step 1 - Define Your Objective
Before you even start looking at the data, figure out what it is you think the data could tell you. Imagine what the results could be. For example, if you are going to look at a listing of transactions from a payable system, you might come up with objectives like: look for transactions that are not supported by purchase orders, or look at the days of the week the transactions posted. The result could show that not all transactions were supported, or we may find transactions posted on the weekend when no one should be working.
Step 2 – Find the Data
Depending on data availability, you may need to combine information from multiple sources. Many times the type of testing you can perform will be dictated by the information that is available to you. For example, let’s say you are planning to look for transactions that were posted on the weekend when no one is supposed to be working. At a minimum, the data you need for this test must include a date/time stamp on each transaction. You will probably want more than just the date/time stamp. You will also want to know who processed the transaction, what was the dollar amount and was there a supervisor approval. Your next question will be about actually getting the data. If you have the required access and sufficient training on the systems involved, you may be able to get the information on your own.
Step 3 - Prepare Your Data
Data preparation includes many different aspects, so we will focus on two of the broadest, most encompassing points: cleaning the data, and normalizing the data. Cleaning data addresses the quality of the information, while normalization eliminates redundancies.
Cleaning the data is especially true when the information is coming from multiple sources. Sometimes you will have a column of text in a spreadsheet, but some of the cells also have numbers, or spaces in front of the letters, or symbols in the data. Cleaning the data will remove all of the unrecognizable information from the cells.
Normalizing looks for different version of the same data entry. For example, my last name could show up four different ways: DeRoche, De Roche, De_Roche, Deroche. These are all the same name, just input into systems in different ways. Normalizing converts all of the variations into one format. If you don’t clean and normalize the data, the output will either produce an error, or the results will be unusable or unreliable.
Step 4 - Analyze Data
At this point, you will have come up with the objective, pulled the data, and spent some time cleansing and normalizing the information, and now it’s time to run the test.
Once you run the tests, you may or may not understand the results. Your best resource to understand the results will likely be the people you are auditing. If it is appropriate, you should take the results back to the data owners for help understanding the output. Keep in mind, this may not always be appropriate, especially if this is for a fraud examination.
Step 5 - Report Results
One of the most important factors in helping management focus and understand the results is the amount of information we present. We must be careful not to overwhelm management with endless information. Always present summary information and provide any details as an appendix to the summary.