Most databases, whether it's Microsoft Access, SQL Server, Oracle, etc., let you create tables and specify any field or fields as the primary key. Unfortunately, this makes it easy to create primary keys that are not optimal which can cause performance problems and even worse, architectural problems that become expensive to fix later.
Every table in a relational database should be keyed. Records in a relational database can be sorted and physically stored in any order, but the key field (or fields) define uniqueness for that record. This makes it easy to link this table to data in other tables. The primary key is also important for secondary indexes.
Secondary indexes can be one or more fields that are defined to speed up searches by maintaining a pre-sorted list. For instance, a customer table may have a secondary index on Zip Code to allow quicker searches on zip code values.
Here are some tips for creating primary keys. As with all rules, there are specific situations where one may deviate from them. Additionally, if your tables are small, adjusting the primary key may have no perceivable impact, so these tips may not matter. However, for larger tables and to support scalability over time, these tips can have a huge impact. The hope is to know these reasons and why you may intentionally deviate from these best practices.
The primary key should be the first field (or fields) in your table design. While most databases allow you to define primary keys on any field in your table, the common convention and what the next developer will expect, is the primary key field(s) coming first.
The values in a table's primary key should never change over time. For instance, a numeric customer ID can point to a customer, and regardless of whether they change their name, address, phone number, etc., we can track the data properly without updating more than one table.
If a primary key value has meaning, it can change which requires updating the tables that reference this value. That may be automatic through cascading referential integrity, but it adds overhead, slows performance, and may create locking conflicts as the data is updated.
Therefore, it's critical to avoid using things like people's names as primary keys. Fields with meaning can also result in duplicates, which is a problem when the primary key is supposed to define each record uniquely.
Fortunately, it's easy to define a unique primary key. An auto generated value such as an Identity column in SQL Server or an AutoNumber field in Microsoft Access does this. It doesn't matter if you delete a record and the key values are no longer consecutive. The primary key should have no meaning other than the permanent tag to that record.
A reasonable exception to this is geographic lists such as Countries, States, and Zip Codes, where the name is keyed rather than using a separate ID. These data are relatively stable and small.
Uniquely tagging a record can be done with a number (long integer). There's no need to have more than one field defining its uniqueness. Text fields require more bytes than numeric fields, so using a number saves considerable space.
Making the primary key as small as possible reduces the memory required to open the primary key when links or searches are performed.
Primary key size also impacts secondary indexes. A secondary index contains the sorted information on the field or fields in the index, plus a reference back to the primary key. Searching on the secondary index is fast because it uses a presorted list, but still points back to the record using the primary key to retrieve all of the record's other fields.
The smaller the primary key, the smaller the secondary index, and the faster the links and searches are. If a table has lots of secondary indexes and lots of records, the difference can be very significant if you are not using a single numeric field as your primary key.
Sometimes people justify composite (multi-field) primary keys to avoid duplicates. This is the wrong approach. To prevent duplicates, create a secondary index for those fields and define it as unique. That lets the secondary index handle that job while the single field primary index retains its efficiencies.
Our Total Access Analyzer add-in performs detailed analysis of your MS Access database objects and detects tables with non-optimal primary keys, field type inconsistencies, plus hundreds of others tips to improve your database designs.