If you looked at the original data for the grocery price list example from a few of weeks back, you might have noticed that the original data was in a completely different format than the data in my example.
In the original, the data was one long list. Each entry had a row, which means that store names, item names, and quantities are all duplicated. It’s only the combinations that vary. That means a lot of duplication, which not only means more typing and higher risk of accidental duplicates, it also makes it easy to have slight differences that make it impossible to automatically compare the data. Without some advanced programming, the difference between aluminum foil and alumnium foil is as large as the difference between apples and yogurt.
The best approach is to put things in a table, so that’s what I did. Now, I could have done this manually by hunting for matches, which wouldn’t actually be too hard if I started by sorting the data. It would just be very tedious.
Fortunately there’s another way to do it, using if(). Here’s how:
1) Generate two lists of unique items. In this case you’re interested in the prince of different items at different stores. Or, to phrase it differently, you want one price for every combination of stores and products. So one of the lists you need is stores, and the other one of products. To get each list just copy the relevant column with the stores in it and paste it somewhere away from the rest of the data, then remove the duplicates from the list.
2) Check for variations. Start by sorting your list of unique items, and then visually scanning for near-duplicates. For example, when I made a list of stores from the original data, I found that the list had both Ralphs and Ralph’s. Even though I don’t live and that area and don’t know all the stores, I was willing to bet that this was actually just two spellings of the same store name.
3) Remove variations. This is NOT just removing them from your list of unique items. If you do that, you won’t be able to pull them from your original data later. Instead, what you need to do is do a global find & replace. For example, in converting this data I replaced all of the Ralphs with Ralph’s. Once you do this you’ll want to remove duplicates once more from both lists.
4) Make one of the lists the column headers for a set of columns to the right of the original data. Just take the list and transpose it, to switch the rows to columns. Once you’ve created the column headers you can safely discard the list you made them from.
5) Sort the original data by the column with the data from the other list. Note here that you’re sorting the original data, not the list of unique items you made. You can actually safely discard that list – it’s already served its purpose by allowing you to find and remove variations in the original data.
6) Copy the prices to the appropriate columns, then copy them up so that all the prices for a given item are on the same line. This one’s complicated, so I’m going to break it down.
The first part, copying the prices to the appropriate columns, you could do with a formula similar to: =if(store_original=store_header,price,””) That is to say, if the store name in the original data for this row is the same as the store name in the column header, return the price. Otherwise, return nothing.
The second part could be done with an incremental formula using another if(). If the product names were in column A, and this formula was in C2, then the formula might be something like =if($A2=$a3,c3,””). That’s saying that if the product in this row is the same as the product below it, then it should return the value from the cell below, but if it’s not then then it should return nothing. I had to use cell references here instead of column names because a reference to an entire column will either return the value from that row, or all the values. Generally, I couldn’t make a name that referred to the cell one column to the left and one row down from the current cell.
Combining these would return something like: =if(store_original=store_header,price,if($A2=$a3,c3,””)). That’s saying that if the store name in the original data for this row is the same as the store name in the column header, return the price. Otherwise, if the product in this row is the same as the product in the row below, return the value from the cell one row below this one. If neither of those is true, i.e. it’s the last occurrence of this product in the list and the store isn’t the one listed in the header, then return blank.
This is why you had to sort it – if you hadn’t, then identical items wouldn’t have been grouped together.
7) Turn the formulas into text values. Just copy all the formulas, and paste as text.
8) Identify and remove the unnecessary rows. Just use the same technique as earlier to remove duplicate rows. Be sure that the ones you duplicate are all the ones below the first one, not all the ones above the last one, or you’ll lose all of the values you just saved!
9) Remove unnecessary columns. In this case, you no longer need the two original columns for store and price – you’ve transferred all that data to the other columns. So you can just delete them.
And that’s it!