Finding missed data

Most of the time when doing time-based statistics, you’re supposed to have statistics for every unit of time you’re reporting on. That is to say, every hour for hourly statistics, every day for daily ones, every month for monthly, and so on. There’s a little trick I use to make sure no chunk of time has been accidentally skipped, and like many of my tricks it relies on the IF function.

=IF(B2-B1=1,””,”check”)

This says that if the difference between the current time unit and the one in the previous row isn’t equal to one, it should be flagged for checking. This works because dates are stored as numbers, so they can be added and subtracted, and one day is equal to 1.

There are a lot of potential variants of this, for different scenarios. For example:

=IF(B2-B1>1,”check”,””)
This one is a little different from the first one, because it’s not checking if it’s equal to one, it’s checking if it’s greater than one. Also, it’s only telling you to check if the condition is true, not if it’s false. There’s no requirement as to whether you should design a condition that you want to be true or one that’s false. This formula could just as easily be =IF(B2-B1<=1,"","check"), but it’s slightly simpler the first way. Regardless, the reason it’s checking if it’s greater than one is that it allows it to be zero, as well as one. This lets you check lists that have more than one entry for each date, since if B2 is the same B1 then B2-B1=0.

=IF(B6=B1+7,”check”,””)
This is for tracking statistics over a five-day work week. One week ago plus seven is equal to today. But because there are only five days recorded for each week, you want to check to see if five rows ago is equal to seven days ago.

=IF(OR(HOUR(B2)=8,HOUR(B2)=HOUR(B1)+1),”check”,””)
This is for checking the hourly statistics in a place where work starts at 8:00 AM. The OR() function allows us to check two possibilities at once, and will return true if either one is true. The first possibility, HOUR(B2)=8, is that the hour of the day is equal to 8, which is 8:00 AM. The second possibility, HOUR(B2)=HOUR(B1)+1, just checks to see if the hour in this row is one more than the hour in the previous row. Therefore this function would return true for any hour from 8 AM to midnight, if it was one hour later than the one in the previous row.

=IF(OR(HOUR(B2)=8,AND(HOUR(B2)=HOUR(B1)+1,HOUR(B2)<18)),"check","")
If it was important to limit it to 8 AM to 5 PM while still checking that no hours were missed you could modify the previous formula by adding an AND() function. The AND() specifies that in addition to having to be one hour more than the previous row, it also has to be less than 18, which is 6 PM.

To use any of these functions, the procedure starts in a way similar to the one for removing duplicates. In this explanation, assume the data you want to check is in column B, and the formula is in column A.

  1. Add headers. If your columns don’t already have headers, just insert a row at the top, and add column headers. For the purposes of this explanation I’ll assume that they’re “Skips” and “Data” respectively.
  2. Sort both columns by the data column. Be sure to specify that you have headers when sorting, or else your headers will vanish into the data somewhere.
  3. Enter the IF formula. Enter whichever formula from the examples above suits your needs in cell A2. Note that you can’t use column names for this since there’s no way to refer to the row above the current one that way. You could, in theory, enter the formula =IF(Data-b1>1,”check”,””), but that makes it harder to read so I don’t recommend it.
  4. Copy the IF formula down the rest of the column. I like to do this by selecting a cell in Data column, hitting ctrl-down arrow to zip to the last entry, moving back over to the formula column, and then selecting all the blank cells above that. If you also select the cell with the original header, Open Office will give you a warning asking if you want to overwrite the data. If you also select the header, which is in row 1, you’ll get a missing reference error. This is because you’re using relative references, and so the formula in A1 would be =IF(b1=b0,”check”,””), except b0 isn’t a valid reference.
  5. Locate a “check” result. You can do this by searching values for cells that contain the word “check”. You can confirm that you’ve found them all using countif.
  6. Evaluate and address the “check” result. Most of the time a “check result means you’re missing some data, and need to track it down and enter it (and re-sort the data). Sometimes you’ll get false positives – “check” results that are fine as-is. For example, the day might have been a holiday, when the business was closed. In these cases you might want to just delete the cell with the formula, so that it doesn’t interfere with future searches.
  7. Repeat steps 5 and 6, until there are no more “check” results. You can confirm that you’ve found them all using countif.
  8. And that’s it. You can now be sure that you’re not missing any data.

Uncategorized