US Household Data Cleaning Project

Damian Owiredu
Jun 13
1 min read

In this project, I walk through the process used for cleaning the raw household data.

Link to code: https://github.com/gl1tchdDev/SQL-Portfolio/blob/main/US%20Household%20Income%20(Data%20Cleaning).sql

🎯 Purpose

To clean and prepare a large dataset of U.S. household income data for analysis. The goal was to fix inconsistencies, remove duplicates, and ensure data accuracy before it could be used in any reporting or BI tools.

Lets take a look at the data

🛠️ Key Steps & Code Evidence

✅ Fixed malformed column name

ALTER TABLE us_project.us_household_income_statistics RENAME COLUMN `ï»¿id` TO `id`;

✅ Identified and removed duplicate IDs

SELECT id, COUNT(id) 
FROM us_project.us_household_income
GROUP BY id
HAVING COUNT(id) > 1;

This screenshot of my code shows that the result of this query is blank. This is because the duplicates that were previously present in the data have been removed by the following query.

DELETE FROM us_household_income
WHERE row_id IN (
	SELECT row_id
	FROM (
		SELECT row_id, id,
		ROW_NUMBER() OVER(PARTITION BY id ORDER BY id) row_num
		FROM us_project.us_household_income
	) duplicates
	WHERE row_num > 1
);

✅ Corrected spelling and category inconsistencies

UPDATE us_project.us_household_income
SET State_Name = "Georgia"
WHERE State_Name = "georia";

This shows that the query above has corrected the spelling error and it has been applied to the table.

UPDATE us_household_income
SET Type = "Borough"
WHERE Type = "Boroughs";

💡 Impact

Created a clean, reliable dataset ready for EDA and dashboard use.
Improved trust in data-driven insights by resolving integrity issues.

US Household Data Cleaning Project

🎯 Purpose

🛠️ Key Steps & Code Evidence

💡 Impact

Recent Posts

Comments