Project: Statistics from daily downloaded data

One thing I do every month is compile statistics on how many questions have come in to our chat reference service from various libraries. On the one hand, these statistics are automatically generated, so there’s no fussing or worry about whether they were recorded accurately. On the other hand, downloading a month’s worth of statistics one day at a time makes it easy to skip a day, or download one day twice.

An additional complication is that new libraries sometimes show up, and they need to be added to the list. I’m doing this project now because it’s a great example of the IF() techniques I highlighted in the last two posts. It also pulls together a lot of things I’ve talked about in the past, so I’ve supplied links if you want to know more about the details or how-to of any step.

The following instructions assume that you’ve already downloaded the data, and compiled it into a single worksheet. I’ve posted an example you can use to follow along, if any part is confusing. In the example I’ve shown each stage as a separate worksheet, but normally you’d just do it all in the same sheet.

Step 1: Extract the dates
Chances are good that your downloaded data will include the date, but it’ll be in text format. If that happens, there are two stages you need to go through to extract the date.

Step 1.1: Extract the year, month, and day.
If your dates are of fixed size, e.g. 06-09-2013 where there’s a leading 0 if the month or day has only one digit, then you can use formulas similar to the ones shown here. If the size varies, e.g. 6-9-2013 and 12-31-2012, then you need to use these techniques.

In the example I have dates in column B, in the format YYYY-MM-DDTHH:MM. The “T” just separates the YYYY-MM-DD, which is the date (four digit year, 2-digit month, 2-digit day), from HH:MM, which is the time (2-digit hour, 2-digit minutes). So I added three columns, and used the following formulas:

Column Formula
Year =LEFT(B2,4)/1
Month =MID(B2,6,2)/1
Day =MID(B2,9,2)/1

The main part of these formulas is a little different from the ones I demonstrated in a previous post, because the format of the source data is different, but in addition to that I’ve also divided them all by one. That’s so that they’ll be converted to numbers, which the next step needs.

Step 1.2: Compile the date.
You can do this using the =date() function.

Step 2: Check for and fix missed dates.
It’s easy to skip a day when downloading daily statistics, or when inputting them from paper. Fortunately there’s an easy technique for finding missed dates in chronological data.

In the example, you’ll notice that there are 9 skipped dates. This doesn’t necessarily mean that there are 9 problems. If you go down to the first one you’ll see that it’s marking July 5, 2012. That’s okay, because the service was closed July 4th for the holiday. In cases like this I might delete the formula in that cell, so I’ll be able to easily tell when I’ve checked all the errors. If you do that, though, and you plan on re-using these formulas later, be sure to copy the formula back over the cleared cells before you go on to the next step!

You might also have noticed that I did something a little nifty with the column header. I used the =concatenate() function to create an
identifying column header that also counts the number of times “check” appears in that column.

Step 3: Check for and fix duplicated entries.
There’s also an easy technique for locating duplicate entries.

In the example, you’ll notice that there are 78 duplicate entries. That doesn’t mean that there are 78 duplicated days, though – just that there were 78 chats during however many days got duplicated.

Step 4: Compile the data into a presentation format
You can do this however you want… you could do a table, or a graph, or set it up so that it can be accessed online. I’ll discuss some of the possibilities and how to do them next week.