2 min read

Overfitting in Life an other experiments

In statistics as in life, overfitting, or "an analysis that corresponds too closely or exactly to a particular set of data" does not yield optimum predictions. Focusing on the signal over the noise in datasets can prevent overfitting.
Overfitting in Life an other experiments
Overfitting.. in life and machine learning...

Wot? (a.k.a. what do I mean by that)

For those of you who have worked with Machine Learning or taken some classes in statistics, the idea of overfitting is not new.  The best example I found was XKCD Electoral precedent - (quoting):

  1. In 1788, "No one has been elected a president before" would be a true statement - until Washington was.
  2. In 1796, "No one without false teeth has ever been elected" would be true, until Adams did.
  3. In 1856, "No one can become president without getting married" would be true, until Buchanan did.

You get the idea.  Predictions can be made using only historic data.  However not all data are created equal.  Overfitting is focusing on irrelevant data - or in other words, focusing on the noise rather than signal.  The result is often overcomplicated and unrealistic models.

How does that apply to life?

In our own life, we tend to overfit in many places.  Where are you focusing on the noise vs. the signal?  Where are your predictions failing you?  When are your models unreliable?

I remember years ago, a good friend of mine was looking for the right girl to marry.  To say that he was overfitting the model for the "one" to fall in love is an under statement (underfitting?).  What has happened since, he found his dream girl, got his heart broken and re-visited his model.  He has been happily married now for decades.  That heartbreak brought to focus what "really" mattered - in other words help him see the features that were signal and not the noise.  Suddenly, his model became simpler and more useful.

I remember coaching a young women who brought up something she wanted to achieve in the upcoming year (redacted specifics for privacy reasons).  When I asked, what is stopping her, she gave me a list of reasons.  For each reason, I countered with "are you saying that no one in the world who has <reason 1> has achieved goal X?".  Here's an example:

The technique used here is called cross validation.  By cross validating the reasons, she soon realized that her reasons were overfitting; noise not signal.  In machine learning as in life, we tend to misuse data.  What results is complicated and underperforming models.

Where do you overfit in life?  Please share your own stories!