Python Cleansing
In previous topics we have discussed basic operations with Numpy,Pandas and SciPy now we will see that Data Cleansing means finding the missing data.
In some areas like Machine Learning and artificial intelligence ,face recognition systems ,data mining needs accurate and complete data for better prediction,poor quality cause the missing value ,we will focus on the missing value for making the strong mode.
What is the Missing Data ?
Missing data refers to the values that are not available and after observing the data might have important information.This type of data might be anything like incompile sequence ,missing file or low level of information may be anything.
There are three category of Missing Data
Note Complete Variable Observed is X and Partial Missing Variable is Y
• Missing Completely at Random(MCAR) : the meaning of the this concept is the we are interested in knowing one thing – How closely can I predict the Missing value.We apply to Mean, Median only when we think it;s MCAR,else we will create some logic ,it is an thinking but unpredictable assumption.egLet;s say we are in 2025 and I see data of people not going outside for too many days and I see year as 2020 ,I can confidently say It’s due to Lockdown
• Missing at Random (MAR) :it refers to the data Missing data for a variable (Y) that is partially missing in the analysis model is related to some other fully observed variable (X), but not to the value of Y.
• Missing not at Random(MNAR):when condition MCAR and MAR not found, the third condition may apply MNAR. When data are missing not at random, we have an idea but your data can tell with confidence…eg people not going to vote are mostly Migrants from other city.