Removing duplicates from a list

Sometimes you have a long list of repeated items, and you just want one of each. Or possibly you have a long list of items that are supposed to be unique, and you want to make sure they are. Here’s a technique I use often to do exactly that, and like many others it depends on a simple IF statement:


What this does is check to see whether the value in cell B2 is the same as the value in the cell right above it, B1. If it is, then it says “delete”, and if it doesn’t it stays blank. You can put anything you want in as long as it makes sense to you. For example, it could be =IF(b2=b1,”duplicate”,”first”), so the first time a value occurs in the list it says “first”, and all other times it says “duplicate”. Continue reading Removing duplicates from a list

Combining data sets

Recently at work we ran into a situation where my coworker had extracted the names of all of the current members from a spreadsheet, and asigned access codes to them. Afterwards, we realized that we also needed the email addresses, so we had to make sure each person had both the right code and the right email address. Continue reading Combining data sets

The IF function (about)

If you’re going to the grocery store anyway, then could you pick up milk?
If I pay $1 extra every month, then my loan will be paid off a month early.
If you finish your vegetables, then you can have dessert, otherwise you’re going straight to bed.

Everyone uses “if” statements, every day. Sometimes you explicitly say “then”, and other times you skip it, but it’s still understood that the second part only happens if the first part is true. Likewise, sometimes you explicitly say what will happen if it’s not true, and sometimes you skip it because it’s obvious or irrelevant.

The IF() function is a fantastic little function, and the most basic building block for getting your spreadsheet to do some of the evaluation for you. The syntax is always the same: IF(condition,then,else) in Excel or Google Docs, or IF(condition;then;else) in OpenOffice. So the statements above could be rephrased as:

Cell name: buy_milk
Formula: =IF(Destination=”store”,TRUE,FALSE)
Translation: If the value in Destination is “store”, then it is true that you should “buy milk”. Otherwise, “buy milk” is false, so you shouldn’t do it.

Cell name: payoff_date
Formula: =IF(payment=minimum_payment+1,original_date-30,original_date)
– or –
=IF(payment=minimum_payment+1, date(year(original_date),month(original_date)-1,day(original_date)), original_date)
Translation: First version – If the payment is the minimum payment plus one, then the actual payoff date is the original payoff date minus 30 days, otherwise it’s the original payoff date. The second version is the same but a little more accurate since it subtracts one month instead of 30 days.

Cell name: Instructions
Formula: =IF(vegetables=”finished”,”you can have dessert”,”go straight to bed”)
Translation: If the value in vegetables is “finished”, then the instructions are “you can have dessert”. Otherwise, the instructions are “go straight to bed”.

Obviously these aren’t things you’d really set up in a spreadsheet. However, I do use IF a whole, lot, mostly for things like:

Checking whether two cells match each other: I use this regularly in proofreading, and it saved me a lot of work when I was doing a project verifying links for e-resources. I also used it recently when combining two sets of data that shared a unique identifier. And I use it a whole lot when trying to condense a list of things down to unique items.

Checking whether a cell contains a certain value: This is very useful when trying to verify that all citations in a paper are referenced, and all references are cited. I also use it a lot when trying to extract data from a text file with section headers.

Checking whether a cell is greater or less than a certain value: I don’t use this a whole lot, but I did use it recently when generating an image of the Mandelbrot set.

I’ll be sharing a lot of projects illustrating these things in the weeks to come, so if the examples here didn’t make sense then actual applications might. 🙂