An analysis of over 100 Machine Learning competition winners in 2020
13 Apr, 2021
You might be hoping to win a machine learning competition in 2021. If so, this issue will tell you just what you need to know! We collaborated with Eniola Olaleye, ranked #5 on Zindi, to look at what winners do.
Together we analysed winners using the ML Contests database of over 100 competitions that took place in 2020 across Kaggle, DrivenData, AICrowd, Zindi, and 20 other platforms. Wherever the information was available, we categorised winners to figure out what made them win.
Some highlights:
94% of 2020 competition winners used Python! Of the remaining, one used R, one C++, and one Weka.
Kaggle hosted 29 competitions - the most of any platform. Next was AIcrowd with 14, then Zindi with 13. But there were many other smaller platforms, with a total of 107 competitions last year.
The total prize pool across all competitions last year was $3.5m!
The competition with the biggest prize pool was the huge $1m Deepfake Detection Challenge.
Where Deep Learning was used, around 70% of winners used PyTorch with only 30% using Tensorflow (in line with PyTorch’s dominance in research).
Gradient boosted trees methods (e.g. XGBoost) were still popular for generic supervised learning competitions (50% of winners), but in other areas Deep Learning solutions were more common. For example, the vast majority of computer vision competition winners used convolutional neural nets (CNNs).
The most popular software libraries used by winners, unsurprisingly, are Pandas, NumPy, scikit-learn, PyTorch, Tensorflow, Keras, and FastAI.
Over 80% of the people who won competitions had won a competition before*.
Over 60% of winners were solo contributors!
For the full in-depth analysis as well as a few case studies, see Eniola’s article over on our new ML Insights Medium publication. Then go forth and use your newfound knowledge on one of the 11 competitions that are currently open!
Note: we weren’t able to find all the information for each of the competition winners, so the various stats and charts represent different subsets of the data.
*the original version of this post incorrectly stated that “Over 80% of the people who won competitions were first-time winners”.