Identifying opening and closing tags – Swiss Army Spreadsheets

Last week I said that I told the spreadsheet to do one thing if the cell contained an opening tag, and a different thing if it contained a closing tag, but I didn’t explain how I did that. The answer is that I used a function which I haven’t previously discussed: =isnumber().

There are a lot of =is____() functions, and some other day I might write up a list and discussion of them, but for now I want to focus on =isnumber(). What this function does is tell you whether or not the part in the () is a number. So what does this have to do with locating tags? Easy: I used =if(isnumber(find(“<div”,Code)),,)

To break that down, it’s testing whether the result of trying to find “<div” in the value on this row of the column named Code is, or is not, a number. Since =find() returns the position of the start of the string if it finds it (the first character is 1, the 2nd character is 2, and so on), and if the string isn’t there it returns an error. Now, in this situation it doesn’t matter where the “<div” is, it just matters that it’s there. So if =find() returns a number, then the line has a “<div”/opening tag. If it doesn’t have “<div” then =find(“<div”,Code) will return an error, and =isnumber(find(“<div”,Code)) will return false.

In the spreadsheet I was counting both “<div” and “</div”, and so I used two columns — one for each. The actual formulas (with “previous” standing in for the cell above the cell with the formula) were:
=if(isnumber(find(“<div”,Code)),previous+1,previous)
=if(isnumber(find(“</div”,Code)),previous+1,previous)

I then subtracted total number of “</div” from the total number of “<div”. Since what I really cared about was the result of that subtraction, I could also have done it all in one combined formula, like this:
=if(isnumber(find(“<div”,Code)),previous+1,if(isnumber(find(“</div”,Code)),previous-1,previous)

In theory, since each line contained either “<div” or “</div” and not both, I could have used an even simpler formula:
=if(isnumber(find(“<div”,Code)),previous+1,previous-1)
However, the problem with this is that no matter how carefully you prepped the data, there’s always a possibility that it’s not exactly the way you think it will be. For example, there might be “<div” and “</div” both on the same line, or there might be a blank line, or other line with neither. The chances are pretty slim given how I prepped this data, but it rarely hurts to write formulas that can deal with unexpected issues.

Which, of course, brings up the question of how exactly I prepped this data. While I tend to write code so that div tags would be at the start of their respective lines, if you look at the screenshot you’ll see that I only have div tags… no paragraph tags, no line breaks, no tables, etc. I also don’t have any leading indentation, which as every programmer knows is fairly important if you want to be able to read the code after you’ve written it. Basically, I used Word to turn the whole thing into one long paragraph by getting rid of all the carriage returns (i.e. using find & replace to replace all the carriage returns with a space), and then I added carriage returns in front of each “<div” and each “</div” using a similar method. You can use this same trick in Open Office, but as far as I can tell Google Docs doesn’t currently have a way to do it. Which, of course, doesn’t mean it’s impossible — you can use spreadsheets to do it. It’s just fairly complex, so I’ll discuss that another week.

Lastly, there’s the inevitable question… was this the only way to count the number of div tags? Of course not. Since “<div” or “</div” appeared at the start of every line I could have used a formula such as:
=if(left(Code,5)=”<div”,previous+1,if(left(Code,5)=”</div”,previous-1,previous))

I just went with =if(isnumber(find()),,) because it’s such a general tool that I use it almost without thinking, even if it’s more robust than I actually need.