World Life Expectancy Data Cleaning Project
- Damian Owiredu
- Jun 14
- 2 min read
🎯 Purpose
Cleaned and completed a global life expectancy dataset for accurate longitudinal health and economic analysis.
Lets take a look at the data

✅ Removed duplicate country-year combinations
DELETE FROM world_life_expectancy
WHERE
Row_ID IN (
SELECT Row_ID
FROM (
SELECT row_id,
concat(country, year),
ROW_NUMBER() OVER (PARTITION BY concat(country, year) ORDER BY concat(country, year)) as Row_Num
FROM world_life_expectancy
) as Row_Table
WHERE Row_Num > 1
);
This query was written to delete all duplicate data present in the table

As i run this code, it clearly shows the duplicate data has been deleted with the query above.
✅ Filled missing status values
UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
SET t1.status = "Developing"
WHERE t1.status = ""
AND t2.status <> ""
AND t2.status = "Developing";
This query updates the table and removes the blank status fields for "Developing" countries from the table

UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
SET t1.status = "Developed"
WHERE t1.status = ""
AND t2.status <> ""
AND t2.status = "Developed";
Along with this query updating the table and removing the blank status field for "Developed" countries from the table.
✅ Imputed missing life expectancy values using average of surrounding years
SELECT t1.Country, t1.Year, t1.`Life expectancy`,
t2.Country, t2.Year, t2.`Life expectancy`,
t3.Country, t3.Year, t3.`Life expectancy`,
ROUND((t2.`Life expectancy` + t3.`Life expectancy`)/2, 1)
FROM world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3
ON t1.Country = t3. Country
AND t1.Year = t3.Year + 1
WHERE t1.`Life expectancy` = "";
In this query I calculate the average between the previous and the next year to fill the "Life expectancy" due to missing data, instead of leaving it blank and skipping it
UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3
ON t1.Country = t3. Country
AND t1.Year = t3.Year + 1
SET t1.`Life expectancy` = ROUND((t2.`Life expectancy` + t3.`Life expectancy`)/2, 1)
WHERE t1.`Life expectancy` = "";
This query updates the table with the previous calculation and fills in the empty space with an average
💡 Impact
Delivered a fully cleaned, complete dataset ready for reliable global health trend analysis.
Comments