Once you have captured your questionnaire data in Microsoft Excel (or downloaded your data from a digital Google Form questionnaire), you must analyze the data. Your analysis must demonstrate your ability to use a range of functions varying in complexity. One of the biggest problems learners face at this stage is finding out that they didn’t collect appropriate data with their questionnaires!
Once you have captured all your raw data, duplicate the raw Data sheet. Rename this copy “Analysis”. On this sheet, you will perform all the required calculations and formatting.
Top tip: Did you collect a name and a surname? If you did and you entered both name and surname in one Excel column, we need to fix that on our Data worksheet before we copy it for analysis. Check the Level 4 section below to see how this can be done, scoring you extra cred at the same time by knocking off some of the more difficult analysis!
Now for our analysis: from the Learner Guide:
|LEVEL||EXAMPLES OF FORMULAS/FUNCTIONS|
|1||Simple functions that only use/include a single cell range,|
e.g. SUM, MAX, MIN, COUNT, AVERAGE, MODE, MEDIAN, LEN, VALUE
a formula using any of the arithmetic operators ( +, – ,*, / )
|2||Functions that include a cell range and one other parameter/condition,|
e.g. ROUND, LARGE, SMALL, LEFT, RIGHT, CONCATENATE, COUNTIF, SUMIF
Calculations using a combination of arithmetic operators and brackets
Calculations using a combination of any two simple functions from level 1
|3||Functions that include a cell range plus two parameters/conditions, e.g. POWER, MID, FIND, COUNTIFS, ROUNDUP, RANDBETWEEN or a simple IF-function or any of the DATE and TIME functions|
Any combination of more than two functions of arithmetic operators, brackets and other functions
functions not in the CAT CAPS curriculum
This analysis is pretty straight forward. You can use a variety of these functions on the age data, for instance, calculating the minimum, maximum, average and mode of the range of ages of the respondents. ROUND (a level 2 function) is in there for good measure.
Note the use of the labels Average, Youngest, Oldest & Mode in D9:D12. These make your work more readable and easier to understand.
Don’t forget to do obvious calculations, for example: using COUNT to determine the number of respondent records. You will use the results of these calculations in further analysis later on.
The easy win here is the COUNTIF function, for example in the Gender column you can calculate the total number of respondents per gender.
Note the correct use of the absolute reference for the cell range argument ($D$2:$D$7) and the use of the labels in C9 & C10 for the criteria (many learners incorrectly select the criteria argument from within the range argument).
Again, you will use the results of these calculations in further analysis later on.
Level 3 is probably the most important level of analysis as it contains the COUNTIFS function. Remember, you are looking to process your data so that you have information that helps you “solve” your Focus Question.
Start by combining a question from the Biographical questions with a question from the Topic section of questions using the COUNTIFS function: calculate the number of females who gave a particular answer to a questions to the number of males that gave the same answer to the same question.
For example: if you have the gender of the respondent and you asked them whether they ate ice-cream or not, you will be able to calculate how many females eat ice-cream versus how many males eat ice-cream.
Be careful: unless you have exactly equal numbers of respondents by gender, you will still have to normalise your data. In other words, if you look at the above numbers you might make the mistake of concluding “1 female eats ice-cream and 2 males eat ice-cream, so twice as many males as females eat ice-cream”.
However, your sample of respondents is not equal in terms of gender, so you must normalise your data. To accomplish this, once you know how many females eat ice-cream, calculate the percentage of females who answered “Yes” as a percentage of the number of female respondents etc. You will now be able to establish which gender has the highest preference for eating ice-cream!
As you can see below, the initial assumption was, in fact, incorrect: the data show no difference in ice-cream eating habits between gender in our sample.
Note again the use of labels which are used as the criteria in the functions as well as serving to describe your information
Level 4 is not easy. The rubric’s suggestion of “A Lookup or nested IF-function OR functions not in the CAT CAPS curriculum” is deceiving: quite honestly I have not been creative enough to imagine a way of using a VLOOKUP in the PAT and I am seldom feeling brave enough for a nested IF statement!
However, here is an example of a nested IF:
Consider using one of the following functions not in the Curriculum:
In the example below I have used three of these functions to correct a common mistake that occurs when first name and surname are captured in a single column. I also include a very real-world solution to a problem often encountered when working with data: “dirty data”.
The full name column holds two pieces of data: a name and a surname. Whilst you could argue that the separate data holds no significance for the solution to this problem, we will separate it nonetheless. The names also contain leading and trailing spaces as well as double spaces. The TRIM function (removes leading and trailing spaces as well as extra spaces in a string) and the CLEAN function (removes special characters) clean up our data for us – compare the values returned by the LEN function in column C to the value returned by the same function in column G. “Cleaning” data in this manner is a very real-world task.
An example of the CONVERT function can be found in the post Phase 2 “additional data”.
Next step: Phase 2 Excel Charts