Subscribe to Windows IT Pro
January 25, 2006 12:00 AM

Designing for Performance: Lookup Tables

When it comes to lookup tables, are more really better?
SQL Server Pro
InstantDoc ID #48811
Rating: (6)

Lookup tables provide the essential function of helping you maintain data integrity in your database environment. For example, if you have users entering their gender into a data item, the table that contains the Gender item can reference a lookup table to verify that only the value M or F is used.

Most database environments have one lookup table to support each domain—a simple, scalable model. But recently I've read some articles proposing a new approach to lookup tables. Instead of one lookup table to support each domain, some database designers are advancing the idea of a single large lookup table that contains all valid values for every domain. The table uses some sort of indicator column so you know when and where to use the value in a column. Readers have asked me whether the one-table approach is better than the many-table technique. Let's take a look at how lookup tables function and why, in this case, breaking the rules of database design isn't a good idea.

Maintaining Integrity
First, let's review how you can use lookup tables throughout the database to restrict data entry while enforcing domain integrity. One of the most common uses for a lookup table is verifying the two-character state code. The entity relationship diagram (ERD) in Figure 1 shows a frequently used two-column version of the State lookup table. The first column in the State table is the two-character StateCode; the second column is StateLongName. While this is a good way to avoid mistakes on data entry ("Let's see, is Alaska AL or AK?"), it's also a way to enforce domain integrity. The domain of StateCode is the set of 50 two-character state abbreviations recognized by the US Postal Service. The domain of StateLongName is the 50 state names, from Alabama to Wyoming. No other combination of letters, numbers, and special characters is valid for state name.

Another name I use for lookup tables is reference tables because their data is referenced by columns in other tables. The traditional one-to-many relationship of the parent reference table, State, to the child table, Address, is also shown in Figure 1. Figure 2 shows the Joins tab of the Reference Properties window; you can see that the Address table's StateCode column is constrained by the set of values in the State table's StateCode column. So, for example, if you want to enter the value AK into the Address table's State-Code column, the value AK must already exist in the State lookup table's StateCode column. That's how a lookup or reference table works. Selecting the Mandatory parent check box on the Integrity tab, as Figure 3, shows, ensures that this relationship is enforced within the database.

Typically, more than one table in the database uses the same lookup table. In addition to the Address table, any other table that has a column containing the two-character state code could reference the State lookup table. For example, say this database also has a ShippingRate table that contains rates by state. If the ShippingRate table has a mandatory relationship with the State table, values in the State column of the ShippingRate table are constrained by the set of valid values for the StateCode column of the State lookup table.

Breaking the Rules
Now back to the original question of whether a one-table approach is better than using separate tables for each domain. While the single-table design might be a clever solution, it's not one that I would use for the simple reason that I like to design scalability into my databases. I much prefer individual lookup tables that enforce integrity in each distinct domain. By dumping all lookup values into one table, then adding a column (or using some other device) to decipher what the individual lookup values refer to, you violate three rules of designing for performance and data integrity:

  • Always normalize your database.
  • Design for performance and scalability.
  • Simplify data management.

The following problems can arise when you break these rules.

You denormalize the lookup table. Remember that by the rule of normalization, each table contains data about one thing (one entity) and one thing only. While you might argue that a lookup table is only about reference data, and thus contains only one type of data, I think this is one level of abstraction too high for the real world. In the traditional approach, the data in a lookup table references only gender, only states, only room types, or whatever might be in a list of related items that you use to maintain the domain integrity of your database.

You create a performance bottleneck and limit scalability. Putting all lookup values, indicators, and codes into the same table requires every application that uses a drop-down or list box or that does a compare against the lookup table to access the same lookup table. This approach creates a "hot spot" on the disk on which the lookup table is located. That hot spot results in higher levels of locking and blocking, increased wait times, decreased performance, and an inability to scale beyond a certain point. "But," you say, "I'm running SQL Server 2005, and I can horizontally partition this large lookup table." However, SQL Server 2005 horizontal partitioning is meant for very large data sets. On smaller data sets (an all-in-one lookup table would be considered a small data set), horizontal partitioning might even be counterproductive. (I can't say for sure, because I haven't run any SQL Server 2005 partitioning benchmark trials yet.) Also partitioning isn't an option unless you're running SQL Server 2005 Enterprise Edition.

Related Content:

ARTICLE TOOLS

Comments
  • kobus
    6 years ago
    Sep 19, 2006

    Disagree with "hotspot" theory. A small lookup table is most likely to be memory cached.
    Disagree with Denormalization hype. Its not as if 100% of all datbase tables are fully normalised in practice. There is always some denormalization going on, and it hurts the least in the relatively smaller lookup tables.
    Disagree that a single lookup table is non-scalable. You are still free to duplicate the set of values for different categories, so changes to one don't affect each other. Agree with alpha/natural primary keys for lookups. Agree that it causes obfuscation, but thats a minor tradeoff for not having 100's of 3-row lookup tables which is another maintenance problem.

    Note that I am talking about lookup tables of small size which store things like form dropdowns , gender etc. Lets say things that have less than 5 options each. I will still keep things like state and country in their own tables.

  • PHILIP
    6 years ago
    Mar 29, 2006

    Well thought out and explained.

  • Jean
    6 years ago
    Mar 20, 2006

    Michelle, this timely article is important for me because I have a colleague touting his old company used the single-reference table approach and is suggesting we do the same. I'd never heard of the single-reference table approach. The point made is that with so many tables and lookup tables, the single-reference table would cut down the clutter. Your point on the "hot spot" is spot-on and I appreciate the comments by Rhys (rmjcsltd,02/20/06) as well. I will arm myself with these pieces of knowledge and test the waters.

  • Rhys
    6 years ago
    Feb 20, 2006

    Michelle,

    I agree completely that individual tables are nearly always the right thing to do, from a data integrity and design clarity point of view.

    However, I don't think that the single table approach has to lead to a performance bottleneck. A single table will be cached in RAM more efficiently than multiple very small tables. If the data really is read-only, then it can be put on a read-only filegroup and SQL Server won't need to use locks when accessing it.

    Rhys

You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement

advertisement

Windows is a trademark of the Microsoft group of companies. Windows IT Pro is used by Penton Media Inc. under license from owner.