In the example below I have used three Microsoft Excel functions to identify and then correct a common mistake that occurs when first name and surname are captured in a single column. I also include a very real-world solution to a problem often encountered when working with data: “dirty data”.
The full name column holds two pieces of data: a name and a surname. Whilst you could argue that the separate data holds no significance for the solution to this problem, we will separate it nonetheless. The names also contain leading and trailing spaces as well as double spaces. I have used the LEN function to highlight the difference in length (number of characters) of the “dirty” and “clean” strings.
The string “Anna Karenina” appears in cell B1. A cursory inspection suggests that there are 13 characters (4 letters, space, 8 letters) in the string. Using the LEN() functions in cell C2 we establish that there are actually 15 characters in the string. This is as a result of there being 3 spaces between the name and surname instead of only 1.
In the above example, LEN has simply been used to highlight values that include characters other than letters or digits — in this instance spaces. Compare the values returned by the LEN function in column C to the value returned by the same function in column G. “Cleaning” data in this manner is a very real-world task.
The TRIM function (removes leading and trailing spaces as well as extra spaces in a string). The TRIM function takes one argument: a string or a reference to a cell containing a string.
The CLEAN function (removes special characters) clean up our data for us. The CLEAN function takes one argument: a string or a reference to a cell containing a string.