At this stage, this post is a jumbled collection of thoughts that I am putting down in writing for the first time. Please feel free to comment or make suggestions/corrections!
Learners will continue to battle with Microsoft Access as long as it is taught the way Excel is taught. This post is aimed at CAT educators. Most of the article covers work outside of the DBE CAT Curriculum, but I believe relationships are now part of the IEB CAT curriculum. I have avoided using the technical language (normalisation, etc) as far as possible.
We must also be sure to treat the Learners as database designers, or rather database architects, and not just database users.
Working with data
The curriculum concentrates on capturing accurate data (using table field properties such as field type, field size, masks, default values etc).
We must also bear in mind some basic guidelines when storing data which will guide us to design decisions which should be made before the first table is even created:
- Store data once
- If data values must be repeated, repeat the smallest amount of data possible
- Don’t store data that can be can be calculated from existing data
Of course, there are many real-life scenarios where these rules are deliberately broken for valid reasons, so you should not be surprised when you see this happen. Access also has the incredibly cool Calculated Data Field type which goes against the above “rules” — again, for very good reason.
The reasons for this process are many, but the main goal is always the same: a lean, fast database
Understanding a database application
It is important to understand that a database, or a database application, is generally created to solve a business problem. Your business problem might be keeping track of customer information.
The first step is to design a database, mapping real-world entities (“things”) that are parts of the problem to objects in the database. This should happen on paper or a whiteboard. A Customer in the real world becomes a customer table in the database. The attributes of these real-world Customers, such as customer name and customer surname become fields in the customer table. The tables store the data and are the fundamental foundation for almost everything else in the application.
A fully functional database application is more than just tables, queries, forms and reports. It is a cohesive application for managing data. In such an application, the database designer should have a number of goals, one of which is to protect the data in the database against the database user. As such, the user of a database application should not be able to interact directly with the tables in any way – let alone modify or even view the structure and design of the underlying tables. The user should only have access to forms and reports and this access should be strictly controlled using the appropriate settings. Read the post An Access Form for data entry only for an example of how this can be achieved.
With this in mind, we can see that creating a combo box in a table using a Value List is problematic as to change that list the user would have to have access to the design view of the table. That is why a more versatile method is employed, that of a lookup table.
Indeed, a database seldom consists of one table — in fact, most consist of many related tables, hence the name relational database. The process of splitting datasets up into multiple tables and then creating increasingly complex links or relationships between them at first seems counter-productive.
Each field should contain the smallest amount of meaningful data. By meaningful I mean meaningful in the context of the problem being solved — you could further split the letters in the names and surnames into separate fields however that would offer no additional meaning when solving this specific problem. This concept is taught in Excel and the reasons are obvious: it makes it possible to use the data more efficiently for example sorting by the first name and/or surname, Mail Merges etc.
|456||Jack Smith||(021) 403-1659 Ext. 53||Cape Town|
|789||Sam Gordon||021-808-9633||Cape Town|
|456||Jack||Smith||(021) 403-1659 Ext. 53||Cape Town|
Divide and conquer!
A quick inspection of the tabulated data shows the repetition of values in the City field. This is a sure-fire sign that a second table could be used to store the names of the cities with a foreign key and the foreign key value could then be stored in the City field representing the name of the city as such.
The above table should be split into 2 tables, namely tblCustomers and tblCities, as follows:
|456||Jack||Smith||(555) 403-1659 Ext. 53||1|
The cityCode in tblCustomers can now be looked up in tblCities (similar to a VLOOKUP in Excel). Rather than storing the string “Johannesburg” (12 characters), you will now store that value once in tblCities and where it must be stored many times in tblCustomer, you only store the cityCode (1 character).
The database now consists of 2 entities, Customers and Cities, represented by 2 tables. Many (∞) Customers can be found in one (1) City, therefore we say that these tables are in a one-to-many relationship. The Primary Key of tblCities, cityCode, is the foreign key in tblCustomers.
To see how to create the above relationship in an Access database, read the post: Creating & using a “lookup table” in Access.
Splitting your tables and then linking them back together seems counter-intuitive at first due to the complexity of the relationships, but the advantages will become evident as your database grows…
There are a set of very fancy-sounding “rules” that are applied in the “normalisation” of data. They are dry and academic and I do not recommend using them when trying to explain database theory to anyone! They are however not as complicated as they sound.
For a table to be in the 2nd norm:
- Meet all the requirements of the 1st normal form
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.