top of page
Search

World Life Expectancy Data Cleaning Project

  • Writer: Damian Owiredu
    Damian Owiredu
  • Jun 14
  • 2 min read

🎯 Purpose

Cleaned and completed a global life expectancy dataset for accurate longitudinal health and economic analysis.


Lets take a look at the data


ree

  • Removed duplicate country-year combinations

DELETE FROM world_life_expectancy
WHERE
	Row_ID IN (
	SELECT Row_ID
FROM (
SELECT row_id,
concat(country, year),
ROW_NUMBER() OVER (PARTITION BY concat(country, year) ORDER BY concat(country, year)) as Row_Num
FROM world_life_expectancy
) as Row_Table
WHERE Row_Num > 1
); 

This query was written to delete all duplicate data present in the table



ree

As i run this code, it clearly shows the duplicate data has been deleted with the query above.







  • Filled missing status values


UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
	ON t1.Country = t2.Country
SET t1.status = "Developing"
WHERE t1.status = ""
AND t2.status <> ""
AND t2.status = "Developing";

This query updates the table and removes the blank status fields for "Developing" countries from the table


ree










UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
	ON t1.Country = t2.Country
SET t1.status = "Developed"
WHERE t1.status = ""
AND t2.status <> ""
AND t2.status = "Developed";

Along with this query updating the table and removing the blank status field for "Developed" countries from the table.



  • Imputed missing life expectancy values using average of surrounding years


SELECT t1.Country, t1.Year, t1.`Life expectancy`,
t2.Country, t2.Year, t2.`Life expectancy`,
t3.Country, t3.Year, t3.`Life expectancy`,
ROUND((t2.`Life expectancy` + t3.`Life expectancy`)/2, 1)
FROM world_life_expectancy t1
JOIN world_life_expectancy t2 
	ON t1.Country = t2.Country
	AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3
	ON t1.Country = t3. Country
	AND t1.Year = t3.Year + 1
WHERE t1.`Life expectancy` = "";

In this query I calculate the average between the previous and the next year to fill the "Life expectancy" due to missing data, instead of leaving it blank and skipping it


UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2 
	ON t1.Country = t2.Country
	AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3
	ON t1.Country = t3. Country
	AND t1.Year = t3.Year + 1
SET t1.`Life expectancy` = ROUND((t2.`Life expectancy` + t3.`Life expectancy`)/2, 1)
WHERE t1.`Life expectancy` = "";

This query updates the table with the previous calculation and fills in the empty space with an average



💡 Impact

Delivered a fully cleaned, complete dataset ready for reliable global health trend analysis.

 
 
 

Comments


CONTACT ME

  • Email: damianowiredu4@gmail,com

  • Phone Number: +447482555448

  • LinkedIn

  • GitHub

bottom of page