Uncovering a Business Rule
I could identify the redundancy in my address book because I understand how ZIP codes relate to places. But the information originally posted to the newsgroup didn't explain how Everything Made Easy runs its business. How can you find out whether an invoice number determines a unique workshop? Only the company's management knows the real answer, but you can draw some conclusions by looking at the information at hand: the sample data, the table structures, and the original question.
The sample data contains no invoice that's split between different workshops. Is it possible to split an invoice? Yes and no. Yes, you can split an invoice because nothing in the database prevents it. You could change Margo's InvoiceID from 2 to 3. But if you do that, you create a problem. You can no longer calculate the total fee for each workshop. Invoice 3 would include Margo's fee for Pies and Tranh's fee for Wallpaper. How would you assign the $150 fee on Invoice 3 to two different workshops?
Two scenarios are possible: Either Everything Made Easy permits only one workshop to appear on an invoice, or it allows more than one workshop on an invoice. The developer who posted the original question let the newsgroup participants know that the company allows only one workshop per invoice. In the language of database theory, this limit is a business rule. However, the database doesn't enforce this rule, and as a result, the database contains something called an update anomaly. Nothing in the tables prevents an update from changing the workshop for Invoice 2 in only one of the two places it appears. So, for example, an update might change Boban's workshop but not Margo's, leaving the database in an anomalous state that doesn't represent any real-world situation because it violates the company's business rule. But this situation never arises, thanks to the business rule. Without the business rule, nothing precludes the anomalous state, and you can't solve the original problem. With more than one workshop per invoice, Everything Made Easy has to realize either that the database is incomplete or that the database needs a rule for splitting one invoice fee between workshops.
Database theory doesn't just find problems, it offers solutions. The "one workshop per invoice" business rule here creates a functional dependency on the columns of the Registrants1 table. The textbook notation for this functional dependency is InvoiceID * Workshop, which means, "InvoiceID determines Workshop." One invoice ID determines one workshop, and you can't have multiple Workshop values in rows that have the same InvoiceID value. Not all functional dependencies create redundancies, but database theory tells us that this one does because InvoiceID determines another column (Workshop) but doesn't determine all the other columns. (For example, InvoiceID doesn't determine the Who column.) Functional dependencies and anomalies are important concepts in normalization theory, and normalizing a database provides many benefits, including the elimination of most anomalies and redundancies.
Almost all theory textbooks describe how to normalize databases. Ramez Elmasri and Shamkadant B. Navathe's book Fundamentals of Database Systems, 3rd edition (Addison-Wesley, 2000) provides a thorough explanation of database normalization and Boyce-Codd normal form, which is the kind of normalization we used to solve Everything Made Easy's problem. Several kinds of normalization, or normal forms, exist, and the more complicated normal forms prevent more anomalies. Database designers often choose Boyce-Codd normal form because it's easy to obtain and it prevents the most common anomalies.
We chose Boyce-Codd normal form because it eliminated the update anomalies and enforced the business rule "one workshop per invoice." One sure way to enforce the functional dependency InvoiceID * Workshop is to allow only one row for each invoice ID. If an InvoiceID value appears in at most one row, you can have only one value of Workshop. You can't impose this requirement on the Registrants1 table because several people can share one invoice, but you can remove the redundant Workshop column from Registrants1 and put the distinct pairs (InvoiceID, Workshop) into a new table that contains the primary key InvoiceID.
Let's see whether this solution creates the summary report you need. The code in Listing 4 creates the new tables InvoiceWorkshop2 and Registrants2 and inserts data for the same people, workshops, and invoices as in Listing 1. I didn't reuse any table names, so if you use the code for practice, you can go back and forth between the versions. Recall that Listing 3's original summary query used the Registrants1 table to match workshops with invoices. You can now use the new table InvoiceWorkshop2 instead. In fact, that's all you change. The summary query in Listing 5 produces the correct results, which Figure 2 shows. If modifying the database isn't an option, you can create InvoiceWorkshop2 as a derived table, then query the original tables by using the code that Listing 6, page 34, shows.
A More "Elementary" Solution
The newsgroup participants designed this solution for an existing database that needed to be improved. If we'd designed the database from scratch or noticed that InvoiceWorkshop2 and Invoices1 have the same primary key and could be combined, we might have found a simpler solution. For example, Listing 7 combines Workshop2 and Invoices1 into a single table and solves the original problem by using a summary query based only on Invoices3. The Invoices3 and Registrants2 tables provide the best database design so far, and you could use them if you had the option of redesigning the database.
Although there's no substitute for real experience in database programming, don't underestimate the value of understanding a real-world business model and being aware of how faithfully the database schema reflects it. And don't underestimate the practical applications of normalization theory and textbook procedures.