Last week I said that this week I was going to talk about how to parse variable-format numbers into numbers. Basically, the problem with variable format numbers is that there are important things you don’t know. This is actually true of a lot of text manipulation once you get outside of the tidy examples I’ve been using up until now.

The simplest type of variable-format numbers is the variable-length one, where you don’t know how long it is, but all you need to do is get rid of a known number of characters at the beginning or the end. This can be addressed by using the function **len()**.

I don’t know for sure why it’s len() instead of length(), but I suspect it has to do with a combination of storage optimization, formula readability, and typing laziness. Regardless, this function is the same in every spreadsheet program I’ve ever worked with.

Here’s some examples of how it works:

len(“5+3”)=3 because there’s three characters.

len(5+3)=1 because 5+3=8, which is one character.

One thing to note is that len() will return the length of what’s actually stored, not how it appears in the cell or formula bar. For example:

A1 | len(A1) | Reason |
---|---|---|

9.0000 | 1 | 9.0000 is stored as 9 |

1% | 4 | 1% is stored as 0.01 |

$3.00 | 1 | $3.00 is stored as 3 |

January 6, 2013 | 5 | Dates are stored as numbers: 1/6/13 is 41280 |

6:45 AM | 7 | Times are also stored as numbers: 6:45 = 6.75/24 = 0.28125 |

1 1/3 | 16 | 1 1/3 is stored as a decimal number: 1.333333333333333. It actually extends infinitely, but the spreadsheet cuts it off after a certain number of decimal places. |

1e30 | 5 | Scientific notation includes a + or – sign, so it’s stored as 1E+30 |

1E15 | 16 | This has few enough digits that it’s actually stored as 1000000000000000 |

Using the techniques I showed you last week, if you had both one and two digit numbers you wanted to extract, you’d need to have two formulas. For example, here’s the results you’d get if you tried to convert $$100 and $$1,000 with the same formula:

Formula | A1=”$$100″ | A1=”$$1,000″ |
---|---|---|

=right(a1,3)/1 | 100 | 0 |

=right(a1,4)/1 | 100 | 0 |

=right(a1,5)/1 | ERROR | 1000 |

However, with the len() function you can make this formula: **=right(a1,len(a1)-2)**

That’s the same as saying “However long a1 is, I want everthing to the right of the first two characters.”

Another potential use is separating numbers. Say for example you’ve got a column for ISBNs, and you want to divide it into ISBN-10 and ISBN-13. That’s easy if all entries have both types of ISBNs – you can just use formulas similar to the ones I used to extract the birth and death dates last week. But if you have some with both, some with just 10-digit ISBNs, and some with just 13-digit ISBNs, then there’s no one formula that will handle all of those cases. However, using a column with the formula =len(ISBNs) you could re-sort the ISBNs according to how long they are. That would let you use one formula to extract both numbers from any entry that’s 25 or more characters long (10+13+”, “), and leave the rest of them as-is. You could also separate the cells that have and ISBN-10 from the ones with an ISBN-13 by assuming that anything between 13 and 24 characters long is an ISBN-13, and anything with 12 or less is an ISBN-10.

length | ISBN-10 formula | ISBN-13 formula |
---|---|---|

25 | =left(ISBNs,10) | =right(ISBNs,13) |

13 | =ISBNs | |

10 | =ISBNs |

This assumes that the ISBNs don’t have any dashes in them. If some have dashes and some don’t then you need to find out where the space or comma is, to know where to divide the two numbers. That’s where the function I’m going to talk about next week comes in.