Access: Normalize Imported DataBy
Most of the time, data coming from Excel is flat, i.e., not structured. As we shuffle imported data into appropriate tables, we need to ensure it is normalized—the data should be separated into different tables, each focused on one category or type of information, with everything related by key fields.
Consider an Excel worksheet that contains data on custom building orders. As you import that data into Access, you need to ask, “What are the objects?” The worksheet has data about customers, orders, and products. Each object you can visualize needs a table of fields to describe it. In Figure 1, the purple cells relate to a customer, the green cells to an order, and the brown cells to a product. Each of these groups would require a separate table.
The note field varies. Sometimes a note is about a customer. Other times, it could be about a product or order. For now, all the notes will be tied to an order, and they can be separated later.
Last month, we added customers from the import table using an append query. In the same way, we will make an append query from the import data for the product information. The query is called qApp_Productz. There aren’t any calculations yet. The SQL statement for the query is:
INSERT INTO Productz ( SerialNum, dtManu, SRP )
SELECT i.SerialNum, i.Mfg, i.MSP
FROM importOrders AS i;
The first line indicates that data will be inserted into the Productz table—specifically, it will be inserted into the fields in that table for serial number (SerialNum), manufacturing date (dtManu), and suggested retail price (SRP). The second line identifies the respective fields of the import table. In the third line, “AS i” means that the importOrders table will be aliased as “i” within the SQL statement. That’s why the serial number field in the second line is i.SerialNum instead of importOrders.SerialNum. Using an alias helps keep the SQL statement shorter and easier to read.
Based on the data we’ve been using in this example, this query will create 370 product records when it’s run.
Adding records to the orders table is a little trickier because the customer and product are stored as long integer foreign keys instead of the customer name, customer code, or product serial number.
Long integers are more efficient since they are 4 bytes each. Contrast this to a string of 25 characters requiring 1 byte each plus a 10-byte overhead for text. The total of 35 bytes is more than eight times the space for a long integer.
Figure 2 shows the Design view of an append query that adds data from the import table (aliased as “i”) to the Orderz table. The price, note, order date, and ID fields come from the import table. The reason ID is appended to a temporary field is to associate the order to the import record. It’s always a good idea to be able to see where the data came from, and keeping this information will help if someone at a later date wants to see what the data was originally.
By linking the import table to the Customerz table on the customer code, the CustomerID can come from the Customerz table. Similarly, the ProductID can come from the Productz table because Productz and the import table are joined on serial number. Each of the join lines points away from the import table. This is so all records from the import table will show even if no customer or product is specified. The SQL statement for this query is:
INSERT INTO Orderz ( Amount, NoteOrd, dtOrder, CustomerID, ProductID, tempID )
SELECT i.Price, i.Note, i.DateOrder, Customerz.CustomerID, Productz.ProductID, i.ID
FROM (importOrders AS i LEFT JOIN Customerz ON i.Code =
Customerz.CustCode) LEFT JOIN Productz ON i.SerialNum = Productz.SerialNum;
This query name is qApp_Orderz. When you run it, 370 records are appended to the orders table. Because each product is custom-made for each order, the number of records for orders and products is the same.
MORE TO DO
Normalizing data leads to more tables, with fewer fields contained in each. Whenever you want details from another table, simply join the tables and link on the appropriate field.
The more you normalize your data, the more you’ll see to do. For example, each part of a product’s serial number represents a different piece of information. The serial number 171219-PEBN-5634-0810 references the manufacture date (12/19/2017), the building style (PEBN), the number assigned by the factory it came from (5634), and the product size (8 X 10). All of this discreet data should be in separate fields. Next month, we’ll extract information from the serial number and normalize the data even more.
Download this month’s files: SF1907_Normalize. The zip file contains the Excel workbook with data, an Access database without the data so you can try it yourself, and a copy of the final database to see the results.
Normalized data has tables for each object you can visualize, linked with short keys for better performance.