Using concatenate() and if() to improve charts and graphs

If you look at the example I discussed last week, specifically the Present_Pie and Present_Line worksheets, you might notice that when making pie charts and the single-value line graph, I had three columns for the source data.

This is significant, because in most cases you’ll only have two columns: one for the axis/category labels, and one for the data itself. After all, that’s all the input the chart or graph will accept.

The reason I have three is that I’m using the third column, which I inserted between the other two columns, to create much more useful label by using concatenate(), and a more readable one by using if().

The basic label’s always static, and contains one piece of information. In the examples I was working with it was either the periods of time (i.e. one per month), or the name of the library. It could also be student names and assignments, if you were recording student scores. It could be baseball players and game dates, if you were tracking how many home runs they scored. It really could be anything, but they come from the row and column headers of your source data, and they’re known before you even start filling the table with data.

What I did was use concatenate to combine the label and the data into a single column. For example, in the total usage pie chart I concatenated together the name of the library and its percentage of the total usage. This means that in the legend, the list of libraries to the right of the pie chart, it says things like “Library 1: 6%” instead of just “Library 1”. Although this adds no new information to the pie chart, it gives the viewer two different ways to see the same information.

If the scale had been important, then I could have used the base value (1773 for Library 1) instead of the percentage in either the in-chart data labels or in the legend.

One note about arrangement and formatting – since I was using fake library names they were all conveniently similar in length, so the percentages after them were more or less in line with each other. However, if I were to be using names of vastly different length like, oh, Peace College and St. Andrews Presbyterian College, any numbers after the college names would be very difficult to scan through. There are two basic solutions to this problem, using the functions I’ve already introduced.

Solution 1: Abbreviate the names to similar lengths. You could, for example, cut “St. Andrews Presbyterian College” down to “St. Andrews”. You could, in fact, use =left() to just take the leftmost 15 or so characters from each label. If you did so you’d want to arrange them in alphabetical order and find any duplicates. If there were some duplicates you’d want to figure out abbreviations that would still convey the proper identification, and manually change them. For example, the University of North Carolina system has a lot of schools named “University of North Carolina at <<insert city name here>>” Anyone who’s reading a chart about North Carolina university libraries should be able to figure out what “UNC Greensboro” and “UNC Wilmington” are.

Solution 2: Put the numbers first. This is simpler by far, but I prefer to put the names first when it’s going in a legend, because English speakers read from left to right, and the library’s identity is something you want to know before you get into the statistic(s) about it.

In the second pie chart, I did away with the legend altogether. The library was identified in the title of the chart, and I used data labels to show the number of chats and the date. In this one I chose to put the date second because you already have a vague idea of the date based on how far around the chart it is. The first slice is January, the second one is February, and so on. It’s not perfect, but it’s enough to go on. Also, because the data is around a circle, rather than in a nice neat column, it’s very easy for our eye to pick which part of the label to focus on. Lastly, it means that the year, which is a number, is distinctly separated from the number of chats.

You may notice, however, that the year only occurs on the January entry – all the others just have the number of chats and the month. That’s because you don’t really need it on the others, and it would just end up cluttering things and making it hard to read. I did this little trick by using an if() statement: if the month was January then include the year, which translates to =if(month(Date)=1,concatenate(” “,year(Date)),””). I have the extra concatenate() in there to add a space between the month and the date – otherwise I’d end up with Jan2012.

In the third pie chart I went back to using a legend, so that I could show the exact percentages for each month in one location, and the actual values in a different one. The chart itself gives us a good idea of the percentages — the first four months have the majority of the traffic, then it abruptly drops in the fifth month, then slowly increases again. It also tells us what scale we’re looking at…does that first chunk represent about 30 chats, or 300, or 3000? The legend lists the exact percentages and the months in a clean, easy to read fashion, making it easier to see which slice corresponds to which month and also giving an alternative representation for alternative learning styles.

On the line graph for total usage I again used the if() trick to get the year to show up only in January. In this case I put the year before the month, so that it would be the part sticking out from the rest of the line. Have you ever seen a dictionary that had indentations in the side of the pages marking the start of each letter? It makes it much easier to go straight to that section, since you only have to parse a single letter rather than the whole words that appear at the top corners of the pages. This is similar, making the year stick out in isolation so it’s easy to scan for that, and then look deeper to find the right month.