Access: Duplicates in Your DataBy
Whenever you work with data, you will inevitably encounter duplicates. Whether they come from imported data, processes being run twice, or somewhere else, they need to be managed. There are several questions to keep in mind: Are there duplicates? What data has been duplicated? Which records need to stay? How should the duplicates be removed?
CHECK FOR DUPLICATES
The first thing to do is create a query to find out if there are duplicates. For sample purposes, I created a table called “Copy of the Direct Materials Budget” in our test database and duplicated the records in the Direct Materials Needed category. This results in 86 original records and 24 duplicates, which will allow us to create a process to find and eliminate the duplicates.
Create a query using the Copy of Direct Materials Budget table as the source. Add the Category, Budget Item, and Production Units fields to the design grid. Run the query. There will be 110 records. Return to Design view and click the Totals button. Run the query again, and there will now be 86 unique items based on these three fields. Save the query as “Unique Records.”
The next step is to identify the duplicates in detail so that we can determine how to resolve the problem. For this particular table, we have lines for the combinations of Category, Budget Item, and Production Units fields. To see which ones are duplicated, we will create a query that shows us which combinations occur more than once.
Create a query with the Copy of Direct Materials Budget table as the source. Add the Category, Budget Item, Production Units, and Sort Order fields to the query. Click the Totals button. To count how many records have the same Category, Budget Item, and Production Units, change the Total line for the Sort Order field from Group By to Sort. Add a Criteria of >1 to the Sort Order field (see Figure 1). This will identify only the duplicates. Save the query as “Duplicate Items.” Run the query, then save and close it.
Now that we have identified the duplicates, we need to create a table to store them so that we can select which ones to keep. The duplicate records may be completely identical, or there may be some unique fields in the record.
Create a query using the Copy of Direct Materials Budget table and Duplicate Items query as the data sources. Link the two via the Category, Budget Item, and Production Units fields. Add all the fields from the Copy of Direct Materials Budget table to the design grid (see Figure 2). Change the query to a Make Table query, assigning the table name “Duplicate Item Detail.” Save the query as “Duplicate Details.” Run the query, then save and close it.
Now we need to remove the duplicated records and put back the unique ones. Create a query using the Copy of Direct Materials Budget table as the table source. By looking at the results of the Duplicate Details query, we can see that all the duplicate records have a Category value of “Direct Materials Needed.” So add the Category field to the query design. Change the query type to a Delete Query, and add “Direct Materials Needed” as the Criteria. Save the query as “Delete Duplicates.” Run the query to delete the duplicates, then close it.
The last step is to add the unduplicated items back to the table. There are several ways to do this:
- If you already have a process to add the records, just run it again to add the unduplicated records back to the Copy of Direct Materials Budget table.
- If the entire record is duplicated, then you can create a query from the Duplicate Items Detail table and simply click the Totals button to group by each field. Then change the query type to Append. Save the query as “Append Unduplicated Records.” Run, save, and close the query.
- If the records have the same Category, Budget Items, and Production Units, but the rest of the data is different, you need to choose which records to keep. Add a field to the Duplicate Items Detail table to check off the ones you want to keep. I added a Yes/No field called “Keep.” Review each item in the Duplicate Items Detail table, putting a check in the Keep field for the records you want to keep. Then create a query with the Duplicate Items Detail as the source. Add all the fields to the design. Set the Criteria for the Keep field to “Yes.” Now it will only select the records that you chose to keep. Make the query an Append query and run it to append the records to the Copy of Direct Materials Budget table. You don’t need to save the query if you aren’t going to use it again.
Download this month’s database here: SF_AUG_2015.
SF BEST PRACTICES
When you find duplicates in a table, carefully create a process to delete the duplicates and add back the ones you want to keep. In case anything goes wrong in your process, first create a backup of the database by copying it to another location.