Microsoft Excel functions banner image.

In the example below I have used three Microsoft Excel functions to identify and then correct a common mistake that occurs when first name and surname are captured in a single column. I also include a very real-world solution to a problem often encountered when working with data: “dirty data”.

In this post:

  1. Scenario
  2. LEN
  3. TRIM
  4. CLEAN
  5. FIND
  6. LEFT
  7. RIGHT
  8. Edge cases

A string is a combination of one or more consecutive characters.

Scenario

The Full Name column holds two pieces of data: a name and a surname. Whilst you could argue that the separate data holds no significance for the solution to this problem, we will separate it nonetheless. The names also contain leading and trailing spaces as well as double spaces. I have used the LEN function to highlight the difference in length (number of characters) of the “dirty” and “clean” strings.

Excel TRIM, CLEAN & FIND exampleExcel TRIM, CLEAN & FIND example

LEN

The string “Anna Karenina” appears in cell B1. A cursory inspection suggests that there are 13 characters (4 letters, space, 8 letters) in the string. Using the LEN function in cell C2 we establish that there are actually 15 characters in the string. This is as a result of there being 3 spaces between the name and surname instead of only 1.

In the above example, LEN has simply been used to highlight values that include characters other than letters or digits — in this instance spaces. Compare the values returned by the LEN function in column C to the value returned by the same function in column G. “Cleaning” data in this manner is a very real-world task.

TRIM

The TRIM function (removes leading and trailing spaces as well as extra spaces in a string). The TRIM function takes one argument: a string or a reference to a cell containing a string.

CLEAN

The CLEAN function (removes special characters) clean up our data for us. The CLEAN function takes one argument: a string or a reference to a cell containing a string.

FIND

The FIND function finds a substring within a string and returns the position of the first instance of that substring in the main string. In the above example, we search for the space in the Full Name string with the assumption that everything before that position is the first name and everything after that position, is the surname (this is a dangerous assumption —  more about this later!).

Consider the record with the ID 3: cell D4 reports that the position of the space is 1. After cleaning the data, we can see that the space that we are really interested in — the space between the name and surname — is in position 11 (cell G3)

Excel TRIM, CLEAN & FIND example
Excel TRIM, CLEAN & FIND example

It should now be clear to you why we took so much trouble to remove extra spaces from the strings (values) in the Full Name column; a leading space (a space at the beginning) results in the FIND function returning 1. You will understand during the next 2 steps how that would have created problems solving our problem.

LEFT

The LEFT function returns a specified number of characters from a string, starting from the left-hand side. For record 2, the FIND function reports that the first space in the string “Buhle Motsepe” occurs at position 6. We now know that all the characters to the left (before) position 6 will be the first name, “Buhle” and can therefore use 6-1 which is 5 as the second argument for the LEFT function, resulting in the function: =LEFT(E3, 5).

We naturally want this function to be dynamic so we use a cell reference in place of the actual number: =LEFT(E3,G3-1).

We could of course combine the functions and perform the operation in one step: =LEFT(E3, FIND(" ",E3))

The RIGHT function returns a specified number of characters from a string, starting from the right-hand side.

Because we will be counting from the right, we calculate:

  1. the length of the entire string with the LEN function: =LEN(E3) which returns 13
  2. subtract the position of the space (which we already have in G3): 13-6 = 7

We then use RIGHT to return the surname: =RIGHT(E3, 7)

Again, using the same logic as we used to make the LEFT function dynamic, we end up with: =RIGHT(E3, F3-G3)

You would have to be pretty brave to complete this is one step: =RIGHT(E3, LEN(E3)-FIND(" ",E3))

Edge cases

This solution is fairly robust, however, should a name be entered where the first name consists of two names separated by a space, or a name, second name and surname are entered, then this solution will not provide results that are 100% correct.

This is just one more reason why you should always ensure that data is entered with separate columns or fields for each of the smallest meaningful values.

By MisterFoxOnline

CAT Educator

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.