# Working with big data sets: streamlining formulas

If you’re working with data sets that are big enough to make your computer slow down or risk locking up when processing them, one of the things you can do to help ameliorate the problem is streamlining your formulas. Continue reading Working with big data sets: streamlining formulas

# How big is big?

Big data is a popular concept, and problem, in the information world.  It’s things like data about every item bought by every person in every store of a large chain, every day of the year for multiple years.  That’s the sort of data that a major company needs to be able to track and mine, and special programs are needed to process and visualize this sort of thing.  In this example a chromogram was used to visualize the Wikipedia activity for just one city over the course of four months.

But for the quantity of data most people and individual businesses need to track, spreadsheets are enough.  Which is not to say that there aren’t limitations, or special tricks, because there definitely are.

How much data a spreadsheet can handle depends on the program, the computer it’s running on, and what you’re trying to do with it.  For example, Excel 2007 running on my laptop can handle a 50,000 line spreadsheet with multiple countif() statements, which are fairly memory intensive.  But a 700 line spreadsheet with multiple countifs() statements and a few vlookup() statements slowed down to a crawl every time it calculated, partially because because countifs() is much more memory intensive than countif().  That same spreadsheet worked just fine on my desktop.

Google Docs slows  down much earlier.  As far as I can tell it’s primarily geared towards small scale data storage and basic calculations, not intense statistical analysis of large data sets.  As a general rule if I have to do a lot of calculations, especially if I’ll be experimenting and building new formulas based on the results of the previous ones, I find that it’s worth it to download the data, do the calculations in Excel or Open Office, and then upload the results.

If you have to work with formulas and data sets that are so large they’re slowing down your computer, there are a number of tricks you can use to speed things up, ranging from changing the default settings on your program, modifying the amount of data analyzed, and changing your approach to the calculations.  I’ll discuss these approaches in more detail starting next week.

# The same but different: conditional formatting in adjacent cells

I was recently looking at apartments online, trying to figure out which ones were worth stopping by in person to take a closer look at. As with any major data project I stored the information in a spreadsheet, but once I had all the data compiled there was too much to easily look at. In addition to rent I was also considering which utilities were included, how long it would take to get to to a variety of locations including work and the nearest grocery store, whether it has washer & dryer in the unit, and so on. So to easily look at all of this I wanted a visualization similar to a heatmap, but there was one problem: the data wasn’t all the same. Continue reading The same but different: conditional formatting in adjacent cells

# Identifying opening and closing tags

Last week I said that I told the spreadsheet to do one thing if the cell contained an opening tag, and a different thing if it contained a closing tag, but I didn’t explain how I did that. The answer is that I used a function which I haven’t previously discussed: =isnumber(). Continue reading Identifying opening and closing tags

# Debugging: Finding missing closing tags

The other day I was humming along, writing HTML, but when I went to look at the result it was obvious that I’d missed a closing tag somewhere. That’s not too uncommon, and fortunately there’s a number of ways of dealing with it. My first resort if I’m working with tables or div tags is usually to make visible borders, but if that doesn’t help me pin down the problem then the next thing I try is using a spreadsheet. Continue reading Debugging: Finding missing closing tags

# Making heat maps with spreadsheets

Spreadsheets have a lot of neat ways to graph things, but there’s one common type of graph that you won’t find on the built-in list: heat maps.

Most people are used to the idea of heat maps, if not the name. One example is the colored maps you see on weather reports, where the redder an area is, the hotter it is. Although these same visualizations are used to depict things other than heat, such as humidity, precipitation, crime rates, and so on, the type of visualization is called a heat map.

Even though it’s not a built-in function, it’s pretty easy to get modern versions of Excel to make one for you, and possible to other spreadsheet programs do so as well, although somewhat more tediously and therefore less precisely. All you have to do is take advantage of conditional formatting. Continue reading Making heat maps with spreadsheets

# Tracking greeting cards, part 2

Last week I mentioned that there were some refinements that could be applied to the greeting card inventory to take advantage of the fact that it’s in a spreadsheet, not on a piece of paper. Continue reading Tracking greeting cards, part 2

# Tracking greeting cards

Another useful little holiday tool I’ve developed is a card tracker. If you’re of the school of manners that says you shouldn’t send the same greeting card to the same person two years in a row, or to two people who are likely to visit each others’ houses, then you’re faced with the question of what to do with any leftovers. You can throw them away, you can try to give them away, you can try to find some other crafty re-use for them, or you can try to track who you sent each one to with some sort of inventory. Continue reading Tracking greeting cards