I was recently looking at apartments online, trying to figure out which ones were worth stopping by in person to take a closer look at. As with any major data project I stored the information in a spreadsheet, but once I had all the data compiled there was too much to easily look at. In addition to rent I was also considering which utilities were included, how long it would take to get to to a variety of locations including work and the nearest grocery store, whether it has washer & dryer in the unit, and so on. So to easily look at all of this I wanted a visualization similar to a heatmap, but there was one problem: the data wasn’t all the same.
If I tried to use basic conditional formatting to make a heatmap the way I did for the hours, it’d come out looking… not very helpful. This is because, when I was analyzing how busy various hours were, all of the numbers reflected the number of chats that took place during that hour. They all used the same unit, and a 5 was always busier than a 4.
However, when looking at apartments I had rent, which was in hundreds of dollars, and distances, which was in minutes. Obviously a 5 minute drive is better than a 30 minute one, and a $600 rent is better than a $1000 one, but if I just did a simple conditional formatting over the whole thing using gradients, as illustrated to the right, then all the distances would be at one extreme, and all the rents would be at the other, with relatively little differentiation.
To complicate matters further, I also recorded “yes” answers as 1, and “no” as zero for all of the possible included utilities and whether or not it had A/C, and recorded in-unit washer/dryers as 2, on-site ones as 1, and no laundry facilities as 0. So in all of these cases higher numbers are better, but in the case of rent and distance, lower numbers were better.
The obvious answer is just to set up conditional formatting for each column separately, but that’s not always as easy as it sounds. The problem is that some programs want to try to think for you. If you have a chunk of data that’s conditionally formatted, and you format an adjacent chunk in the same way, a lot of times the program will assume that you mean for them to be combined.
You get around this in much the way you get around auto-correct “fixing” the spelling or capitalization of a word you’ve typed correctly but that it doesn’t have in its dictionary. You make the change in a way that won’t invoke the undesired behavior. In other words:
- Group your data. This step is optional, but it can save you effort in the long run. If you group everything with comparable data into adjacent columns, then you can set the formatting for the entire group at once.
- Insert a column between each group. By inserting a blank column, you’re making the columns non-adjacent, so the spreadsheet won’t try to merge them.
- Format the columns. Keep your colors consistent as to good and bad, not high and low.
- Delete the empty columns. Now that the formatting is set, the spreadsheet won’t check for adjacent cells again, unless you change the formatting.
In addition to the conditional formatting, you might notice a couple of other formatting techniques I used to make it easier to read the chart. First, I made the background of cells I didn’t have data for yellow, so they would appear neutral. Second, I made the more important cells wider, so they’d take up more visual room. Monthly rent is much more important than whether or not any given utility is included, so it has a much wider column. Since this sort of data visualization is about being able to give the viewer a quick overall idea, not details, little tricks like these can make all the difference.