Rainie Dang's profile

Gabage IN garbage OUT infographic

The project is about creating infographic poster (18in x 24in) based on Kaggle dataset about safe drinking water (https://www.kaggle.com/adityakadiwal/water-potability) and research about the topic. 

However, after diving deep in each data and comparing with the research, I figured out that this dataset is a bad data that provide false and missing infomation.

In this infographic, I am going to talk about "Gabage in Gabage out" and prove that how bad dataset affect the quality of data outcome.
What is in dataset?
The dataset contains water quality metrics for 3276 different water bodies. All waterbody have 9 different categories for condition to determines if that water body is potable or not.
Gathering evidence
On Microsoft Excel, I started to learn about the dataset:
- Calculating for each categories and finding which categories don't follow the data - description 
- Calculating missing data
- Comparing potability result with range that provided in description
- Calculate % of each categories and evidences.
Research
Research standard condition for water potability to compare with the range that the dataset provides.
Research about "Garbage IN Garbage OUT" term
Reseasrch about how to avoid avoid bad data and improve quality of data.
Draft 1 & 2
Gabage IN garbage OUT infographic
Published:

Gabage IN garbage OUT infographic

Published: