Once you have captured your questionnaire data in Microsoft Excel, you must analyse the data. Your analysis must demonstrate your ability to use a range of functions varying in complexity. One of the biggest problems learners face at this stage is finding out that they didn’t collect appropriate data with their questionnaires!

From the Learner Guide:

LEVEL | EXAMPLES OF FORMULAS/FUNCTIONS |
---|---|

1 | Simple functions that only use/include a single cell range, e.g. SUM, MAX, MIN, COUNT, AVERAGE, MODE, MEDIAN, LEN, VALUE OR a formula using any of the arithmetic operators ( +, – ,*, / ) |

2 | Functions that include a cell range and one other parameter/condition, e.g. ROUND, LARGE, SMALL, LEFT, RIGHT, CONCATENATE, COUNTIF, SUMIF OR Calculations using a combination of arithmetic operators and brackets OR Calculations using a combination of any two simple functions from level 1 |

3 | Functions that include a cell range plus two parameters/conditions, e.g. POWER, MID, FIND, COUNTIFS, ROUNDUP, RANDBETWEEN or a simple IF-function or any of the DATE and TIME functions OR Any combination of more than two functions of arithmetic operators, brackets and other functions |

4 | A LOOKUP OR nested IF-function OR functions not in the CAT CAPS curriculum |

### Level 1

This analysis is pretty straight forward. You can use a variety of these functions on the age data, for instance, calculating the minimum, maximum, average and mode of the range of ages of the respondents. ROUND (a level 2 function) is in there for good measure.

Don’t forget to do obvious calculations, for example: using COUNT to determine the number of respondent records.

You will use the results of these calculation in further analysis later on.

### Level 2

The easy win here is the COUNTIF function, for example in the Gender column you can calculate the total number of respondents per gender.

Note the correct use of the absolute reference for the cell range argument and the use of the labels for the criteria (many learners incorrectly select the *criteria *argument from within the *range *argument).

Again, you will use the results of these calculations in further analysis later on.

### Level 3

Level 3 is probably the most important level of analysis as it contains the COUNTIFS function. Remember, you are looking to process your data so that you have information that helps you “solve” your Focus Question.

Start by combining a question from the Biographical questions with a question from the Topic section of questions using the COUNTIFS function: calculate the number of females who gave a particular answer to a questions to the number of males that gave the same answer to the same question.

For example: if you have the gender of the respondent and you asked them whether they ate ice-cream or not, you will be able to calculate how many females eat ice-cream versus how many males eat ice-cream.

Be careful: unless you have exactly equal numbers of respondents by gender, you will still have to *normalise *your data. In other words, if you look at the above numbers you might make the mistake of concluding “1 female eats ice-cream and 2 males eat ice-cream, so twice as many males as females eat ice-cream”.

However, your sample of respondents is not equal in terms of gender, so you must *normalise *your data. To accomplish this, once you know how many females eat ice-cream, calculate the percentage of females who answered “Yes” as a percentage of the number of female respondents etc. You will now be able to establish which gender has the highest preference for eating ice-cream!

As you can see below, the initial assumption was, in fact, incorrect: the data show no difference in ice-cream eating habits between gender in our sample.

### Level 4

Level 4 is not easy.

Quite honestly I have not been creative enough to imagine a way of using a LOOKUP in the PAT.

If you are feeling incredibly brave you can try a nested IF statement.

If not, consider using one of the following functions not in the Curriculum:

- AND
- CLEAN
- CONVERT
- FIND
- TRIM

In the example below I have used three of these functions to correct a common mistake that occurs when first name and surname are captured in a single column. I also include a very real-world solution to a problem often encountered when working with data: “dirty data”.

The full name column holds two pieces of data: a name and a surname. Whilst you could argue that the separate data holds no significance for the solution to this problem, we will separate it nonetheless. The names also contain leading and trailing spaces as well as double spaces. The TRIM function (removes leading and trailing spaces as well as extra spaces in a string) and the CLEAN function (removes special characters) clean up our data for us – compare the values returned by the LEN function in column C to the value returned by the same function in column G. “Cleaning” data in this manner is a very real-world task.

An example of the CONVERT function can be found in the post Phase 2 “additional data”