Teaching database theory

At this stage, this post is a jumbled collection of thoughts that I am putting down in writing for the first time. Please feel free to comment or make suggestions/corrections!

Students will continue to battle with Microsoft Access as long as it is taught the way Excel is taught. This post is aimed at ICT & CAT (and IT) teachers. Most of the article covers work outside of the DBE CAPS CAT Curriculum, however, relationships are part of the IEB CAT & the Cambridge IGCSE & AS curriculae. I have avoided using the technical language (normalisation, etc) as far as possible. We must also be sure to treat the students as database designers, or rather database architects, and not just data capturers or database users.

1. Working with data

DBE CAPS CAT

The CAPS CAT curriculum concentrates on capturing accurate data (using table field data type as well as additional properties such as field size, validation rules, input masks, default values etc.). This often leads to too much attention being focused on data and too little on design. Choosing the correct data types for your table fields is no substitute for designing a database!

Cambridge International ICT

The Cambridge International ICT & IT curricula take a more thorough approach and delve deeper into database design with the inclusion of topics such as normalisation and validation & verification.

We must bear some basic guidelines in mind when storing data that will guide us to design decisions that should be made before the first table is even created:

George Fuechsel, an IBM programmer and instructor, is generally credited with coining the term “garbage in, garbage out” in the early 1960s. He is said to have used the phrase to state that a computer model or program just processes what it is given: If it is given bad information, it will produce bad information. ^[1]

Garbage In Garbage Out (“GIGO”)
Store data once
If data values must be repeated, repeat the smallest amount of data possible
Don’t store data that can be deduced (calculated) from existing data
Never trust your user:
- users must never see or edit tables directly, but only through forms and reports
Never trust your data:
- always clean your data, regardless of how it was input or imported
Back your data up frequently especially before running an action query (DELETE & UPDATE queries). You can simply make a copy of your database or a copy of the database object you are working on within the database.

Of course, there are many real-life scenarios where these rules are deliberately broken for valid reasons, so you should not be surprised when you see this happen. Access also has the incredibly cool Calculated Data Field type which goes against the above “rules” — again, for very good reason.

The reasons for this process are many, but the main goal is always the same: a lean (small in terms of bytes), fast database

2. Understanding a database application

It is important to understand that a database, or a database application, is created to solve a business problem. Your business problem might be keeping track of customer information.

The first step is to design a database, mapping real-world entities (“things”) that are parts of the problem to objects in the database. This could happen on paper or a whiteboard, or even specialised software such as MySQL Workbench. A Customer in the real world becomes a record in a customer table in the database. The attributes of these real-world Customers, such as customer name and customer surname become fields in the customer table. The tables store the data and form the fundamental building blocks for almost everything else in the application.

You will notice I am a huge fan of verbose naming conventions!

A fully functional database application is more than just tables, queries, forms, and reports. It is a cohesive application for managing data. In such an application, the database designer should have several goals, one of which is to protect the data in the database against the database user. As such, the user of a database application should not be able to interact directly with the tables in any way — let alone modify or even view the structure and design of the underlying tables. The user should only have access to forms and reports and this access should be strictly controlled using the appropriate settings. Read the post An Access Form for data entry only for an example of how this can be achieved.

With this in mind, we can see that creating a combo box in a table using a Value List is problematic as to change the values in the list the user would have to have access to the design view of the table, something you simply cannot allow. That is why a more versatile method is employed, that of a lookup table.

Indeed, a database seldom consists of one table — in fact, most consist of many related tables, hence the name relational database. The process of splitting datasets up into multiple tables and then creating increasingly complex links or relationships between them at first seems counter-productive.

3. Naming conventions

This is definitely one of those instances where I adopt a “better safe than sorry” policy! You can use some characters and not others; spaces will work in some instances and not others, so:

Use only letters in your names
Do not use punctuation marks
Use camelcase for names using more than one word:
- memberID, memberName
Prefix the name of your objects appropriately:
- tblMembers, qryUpdatePrice, frmNewMember, rptMemberTelephoneNumbers

4. Atomic data

Ensuring that data is atomic is one of the requirements for normalising data to the first normal form.

Data should be atomic: each field should contain the smallest amount of meaningful data. By meaningful I mean meaningful in the context of the problem being solved — you could further split the letters in the names and surnames into separate fields however that would offer no additional meaning when solving this specific problem. This concept is taught in Excel and the reasons are obvious: it makes it possible to use the data more efficiently for example sorting by the first name and/or surname, Mail Merges etc.

🔑 customerID	customerName	customerTelephone	customerCity
123	Eric Jones	011-861-2025	Johannesburg
456	Jack Smith	(021) 403-1659 Ext. 53	Cape Town
789	Sam Gordon	021-808-9633	Cape Town

tblCustomer the wrong way

🔑 customerID	customerName	customerSurname	customerTelephone	customerCity
123	Eric	Jones	011-861-2025	Johannesburg
456	Jack	Smith	(021) 403-1659 Ext. 53	Cape Town
789	Sam	Gordon	021-808-9633	Cape Town

tblCustomer the right way

5. Divide & conquer!

A quick inspection of the tabulated data shows the repetition of values in the City field. This is a sure-fire sign that a second table could be used to store the names of the cities with a foreign key and the foreign key value could then be stored in the City field representing the name of the city as such.

The above table should be split into 2 tables, namely tblCustomers and tblCities, as follows:

🔑 customerID	customerName	customerSurname	customerTelephone	cityCode
123	Eric	Jones	555-861-2025	2
456	Jack	Smith	(555) 403-1659 Ext. 53	1
789	Sam	Gordon	555-808-9633	1

🔑 cityCode	cityName
1	Cape Town
2	Johannesburg
3	Durban

The cityCode in tblCustomers can now be looked up in tblCities (similar to a VLOOKUP in Excel). Rather than storing the string “Johannesburg” (12 characters), you will now store that value once in tblCities and where it must be stored many times in tblCustomer, you only store the cityCode (1 character). The names of the fields do not need to be the same, but they must have the same data types.

The database now consists of 2 entities, Customers and Cities, represented by 2 tables. Many (∞) Customers can be found in one (1) City, therefore we say that these tables are in a one-to-many relationship. The Primary Key of tblCities, cityCode, is the foreign key in tblCustomers.

To see how to create the above relationship in an Access database, read the posts Creating & using a “lookup table” in Access & Creating a relationship between 2 tables in Access.

Splitting your tables and then linking them back together seems counter-intuitive at first due to the complexity of the relationships, but the advantages will become evident as your database grows…

6. Normalising data

There is a set of very fancy-sounding “rules” that are applied in the “normalisation” of data. They are dry and academic, and I do not recommend using them when trying to explain database theory to anyone! They are however not as complicated as they sound. In fact, by following the above steps in Table Manners & Divide & Conquer that is exactly what you have been doing: normalising tables to the 3NF (Third Normal Form”).

See more in the post Normalising data.

References:

What is garbage in, garbage out (GIGO) ? | Definition from TechTarget (2023). Available at: https://www.techtarget.com/searchsoftwarequality/definition/garbage-in-garbage-out (Accessed: 7 August 2023).

Teaching database theory

ByMisterFoxOnline

In this post:

1. Working with data

DBE CAPS CAT

Cambridge International ICT

2. Understanding a database application

3. Naming conventions

4. Atomic data

5. Divide & conquer!

6. Normalising data

References:

Like this:

By MisterFoxOnline

Related Post

VBA to truncate or delete Access database tables

Adding fields to an existing Access Report

Changing field properties during import into Access

Leave a ReplyCancel reply

You missed

Data Validation in Excel

Using LaTeX with Word

Using MATCH() and INDEX() in Excel

Using LaTeX with WordPress

ByMisterFoxOnline

In this post:

1. Working with data

DBE CAPS CAT

Cambridge International ICT

2. Understanding a database application

3. Naming conventions

4. Atomic data

5. Divide & conquer!

6. Normalising data

References:

Share this:

Like this:

By MisterFoxOnline

Related Post

Leave a ReplyCancel reply

You missed