The Importance of Ground Truth in Data Science

March 23, 2016 Clinton Bonner

by @ClintonBon

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit fact.” - Sherlock Holmes

Understanding and establishing the Ground Truth is an incredibly important step in achieving data science success. So let’s quickly define it and discuss why it matters.   Ground Truth - What is it? Ground Truth is factual data that has been observed or measured, and can be analyzed objectively. It has not been inferred. If the data is based on an assumption, subject to opinion, or up for discussion, then, by definition, that is not Ground Truth data. Your ability to solve a problem using data science depends tremendously on how you frame the problem and discerning without ambiguity, if you can establish Ground Truth. Watch this short video and listen to the stock market example provided to better understand what would constitute proper Ground Truth:

 

 

When establishing Ground Truth, the data source(s) needed should be evident and not subject to opinion.   Why does Ground Truth matter? This is simple. Without proper Ground Truth data, the value of your predictive algorithms is highly questionable and potentially harmful to your business. If your analytics are based on subjective data and assumptions, your analysis will very likely be off base and therefore not valuable to you.

At Topcoder, we usher our Data Science clients and partners through a proven 5-stage process, allowing real world problems to be solved by our community of data scientists. After exploring the problem and expertly defining what you are seeking to solve, a critical next step you will take with our team is deciding on and establishing the Ground Truth. We will guide you at each step.

Keep Sherlock’s wise commentary in mind: Do not theorize before you have data. Take the crucial steps needed and establish the Ground Truth!

Previous Video
TCO14 Pickup Algorithm Contest
TCO14 Pickup Algorithm Contest

Next Video
Topcoder Data Science Marathon Match: Prostate Cancer Foundation - Computational Oncology
Topcoder Data Science Marathon Match: Prostate Cancer Foundation - Computational Oncology

Did you know prostate cancer is the 4th most diagnosed tumor in the entire world? Watch this short crowdsou...