Why you should never remove outliers from your data?

(Explained with an analogy of parenthood)

Param Saraf
2 min readJul 8, 2022
Photo by Derek Thomson on Unsplash

Outliers are rare but essential part of data just like those crucial and rare events which change your life.

Parents raise their child with utmost care and showcase a perfect world to them where there is no division based on race, religion, gender etc. and everyone is equal. There is no well defined purpose or a sense of competition and the objective of life is simple i.e. To have fun.

Now let’s come back to data. What if you keep removing the outliers to keep your data ideal, similar to hiding the conflicts from the child every time it comes up or provide him/her with an answer he could digest.

Over the course of time the outlier’s definition will change and they will start entering your data just like a child entering his/her teenage with high octane hormones starts removing his/her blinkers and begins exploring out of curiosity.

The basic pillar of evolution is adaptability and to adapt one needs to see every possible situation as early and as much as possible. A model that drifts with an outlier is just too rigid and lacks adaptability but a model that sees outliers here and there but normalises over time makes it robust.

One may argue that this doesn’t work with parenthood and you can’t reveal few things too early as their minds are too sensitive.

Well at least don’t paint their minds black & white, right & wrong, good boy & bad boy. By doing this we put a lot of fences in their minds which they are not supposed to cross but eventually they will and hence face the conflicts.

All we can do is try to make their mental models all encompassing, all accepting that generalises well, which is only possible by showing them outliers from time to time.

--

--

Param Saraf

Data Scientist | Machine Learning Engineer | Power BI/ MSBI Expert