Text function: finding sub-text

Sometimes you only care about a certain part of a string of text. For example, when reading a letter from a potential employer the first thing you care about is whether it says to schedule an interview, or to submit further information, or that they have selected someone else.

Obviously you’re not going to put a letter like that into Excel just to search for key phrases, mostly because it’s not worth the effort to do so for just one thing. However, there are lots of times when something is already in a spreadsheet, or can very easily be converted to a spreadsheet, when using this function can save you a whole lot of time.

For example, imagine a list of names in the “last, first” format. If you wanted to switch them you could use find() to get the location of the comma. Everything before the comma is the last name. Everything after it (except the leading space) is the first name.

For practical purposes there are two basic forms of the find() function:
=find(subtext,text), which I read as “find subtext in text.”
=find(subtext,text,start), which I read as “find subtext in text, starting at character start.”

Essentially, the first form is the same as =find(subtext, text, 1), which is to say “find subtext in text, starting at the first character.”

These forms are the same in every spreadsheet I’ve tried, including ranging from Excel 97 to Excel 2010, Google Docs, and Open Office (except in Open Office you have ; instead of ,).

Let me give you some quick examples:

One thing I frequently use find() for is to help with citation and reference matching. That is to say, if I’ve got a manuscript and I need to make sure every citation is referenced and every reference is cited, I first need to go through and pull out all the references. Assuming parenthetical citations, I can do this by finding lines with “(” or “)” in them. Some of the parenthesis won’t actually be citations, of course, but there will be few enough that I can check them by hand, and I’ll be absolutely certain I’ve spotted every single citation.

The reason I’m mentioning this function at this time is that it’s what lets you handle text that’s in a standard format, but which varies in length. For example, take the ISBN example I described a couple of weeks ago. In the solution I provided, you had to assume that all ISBNs were presented without dashes, so an ISBN-10 was exactly 10 characters long, and an ISBN-13 was exactly 13. Using a combination of find() and len(), however, you don’t need that assumption anymore.

Here are the new formulas, with explanation, for a cell named “ISBNs” containing “1451638450, 978-14516-3845-5”

ISBN-10 = left(ISBNs,find(“,”,ISBNs)-1)
Breakdown: find(“,”,ISBNs) = 11
11-1=10 (subtract 1 so the comma isn’t included)
ISBN-13 = right(ISBNs,len(ISBNs)-(find(“,”,ISBNs)+1))
Breakdown: find(“,”,ISBNs) = 11
11+1=12 (add one because you want the character after the space after the comma)

Now the only assumption you still need to make is that the ISBN-10 comes before the ISBN-13. There’s even ways to get around that, automatically, but I’ll discuss it later. If you want to do it manually you could try =len(ISBN13)-len(ISBN10), and sort the columns by that result. If it’s a negative number, then the ISBN-10 is longer than the ISBN-13, and you might want to switch their locations. That’s easily done using copy & paste and an empty column for temporary storage.