in data science

Data Mining Competitions on Kaggle – Check your skills in real world problems

Kaggle.com is a platform connecting companies and organizations with data analysts’ community waiting to explore and solve their problems. Idea behind this website is brilliant. It is a total win-win situation. Let’s look at kaggle.com closer.


How does it work?

Here is standard use case:

  1. Company XYZ has a problem that can be solved by data mining and decides to organize competition on kaggle platform. XYZ offers cash prizes for top 3 solutions and publishes dataset and problem description.
  2. Competition is now listed on kaggle.com. There is specified time to solve problem and submit solution. Many kaggle users decide to take a part in competition. They independently try to find best solution and they submit it to kaggle. Every user can do many submissions. Analysts have two motivations – kaggle ranking points and cash prizes.
  3. Solutions are rated ad-hoc using part of test dataset that is not publicly available. Thanks to that every analyst after submitting knows exactly how good is her/his solutions in comparison with other kagglers. Rest of test dataset is used to examine solutions after the final submission deadline. It makes it hard to win competition by overfiting model to data or guess prediction by making a lot of submissions.
  4. Finally top competitors give organizer methods and know-how about their submissions and get prizes.


Win – win

Let’s start analyzing such competition from XYZ company perspective. What do you think – how much money XYZ should spend to hire top-analyst for a month to solve its problem and choose best solution?

I will help you with this math task. Average annual salary for offers from remoteok.io and tagged as “data science” is 83k USD. Let’s assume that 200 analyst would work only for a month:

200 * 83 000 / 12 = 138 333,33 USD

Yes. It’s nearly140k USD. It still can be profitable if this solution significantly improves XYZ business processes. Alternatively XYZ can spend 50k USD to organize competition and fund cash prizes at kaggle.com. This is enough to attract attention many kagglers. It would be cheaper, easier and less time consuming.

Considering the same competition from analyst perspective – it is a great opportunity to test your skills, enhance your portfolio and maybe even win attractive cash prizes. Kaggle.com publishes not only rankings for every competition. It also publishes overall data scientist ranking. Being ranked as one of TOP 100 kagglers is real proof of your skills. Of course it requires a lot of work but kaggle is community platform so you can meet and learn from other users.
Being top ranked in kaggle ranking is reserved for very best performers but even beginners can find its offer as attractive way to get practical experience during their studies or before getting full time job in the industry.


Conclusion

If you find kaggle interesting here you can find list of active competitions. I decided to start in Coupon Purchase Prediction challenge.

Write a Comment

Comment