World Life Expectancy Data Cleaning Project
- Jun 14, 2025
- 2 min read
🎯 Purpose
Cleaned and completed a global life expectancy dataset for accurate longitudinal health and economic analysis.
Lets take a look at the data

âś… Removed duplicate country-year combinations
DELETE FROM world_life_expectancy
WHERE
Row_ID IN (
SELECT Row_ID
FROM (
SELECT row_id,
concat(country, year),
ROW_NUMBER() OVER (PARTITION BY concat(country, year) ORDER BY concat(country, year)) as Row_Num
FROM world_life_expectancy
) as Row_Table
WHERE Row_Num > 1
); This query was written to delete all duplicate data present in the table

As i run this code, it clearly shows the duplicate data has been deleted with the query above.
✅ Filled missing status values
UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
SET t1.status = "Developing"
WHERE t1.status = ""
AND t2.status <> ""
AND t2.status = "Developing";This query updates the table and removes the blank status fields for "Developing" countries from the table

UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
SET t1.status = "Developed"
WHERE t1.status = ""
AND t2.status <> ""
AND t2.status = "Developed";Along with this query updating the table and removing the blank status field for "Developed" countries from the table.
âś… Imputed missing life expectancy values using average of surrounding years
SELECT t1.Country, t1.Year, t1.`Life expectancy`,
t2.Country, t2.Year, t2.`Life expectancy`,
t3.Country, t3.Year, t3.`Life expectancy`,
ROUND((t2.`Life expectancy` + t3.`Life expectancy`)/2, 1)
FROM world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3
ON t1.Country = t3. Country
AND t1.Year = t3.Year + 1
WHERE t1.`Life expectancy` = "";In this query I calculate the average between the previous and the next year to fill the "Life expectancy" due to missing data, instead of leaving it blank and skipping it
UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3
ON t1.Country = t3. Country
AND t1.Year = t3.Year + 1
SET t1.`Life expectancy` = ROUND((t2.`Life expectancy` + t3.`Life expectancy`)/2, 1)
WHERE t1.`Life expectancy` = "";This query updates the table with the previous calculation and fills in the empty space with an average
đź’ˇ Impact
Delivered a fully cleaned, complete dataset ready for reliable global health trend analysis.



Comments