This is me liveblogging from a conference lecture by Ben Hammer here (located here https://www.youtube.com/watch?v=9Zag7uhjdYo):
- There seem to be a lot of people from India on the board as well.
- Meh. Machine LEarning for optimizing petroleum exploration. Wow.
- Grading essays automatically. This asks us for one grade, but takes in an entity which for the ordinary person is very subjective.
- Human Level Performance!
- I've always been fascinated by the prospect of diving headlong into one project, without having to worry about anything else, and THEN improvise my way around. Kaggle seems to be one avenue where I can pick up a project for 80 days.
- Toxicity of compounds. Biology. Reminds me, the more we concentrate ourselves on social data, the more we go away from the bulk of biological data.
- Also, toxicity of a compound might not be an answer we expect from a machine learning algorithm, because toxicity is ultimately the effect the compound has on the host.
- BUT, it reminds us, that if we take in as many features and make it a supervised learning algorithm, we do get very close to the feature-sets that are relevant in the toxicity mechanisms.
How do Machine Learning Competitions Work?
- The simple task about the training set and the test set.
- The leaderboard serves the purpose of motivating everyone to try and improve their individual rankings, which effectively means that the entire community submits one more time.
- In turn, this leads to the entire solution space being explored.
- The frontier of machine learning is reached pretty soon for easy competitions.
- For tougher challenges, the frontier is reached a bit late.
- On short answer for prompts, the best entry had hand tuned approaches for a given prompt.
- The second entry, which came in very close, had the same approach for all prompts.
- Interestingly, StartUps have come out of such competitions.
- Others have used what they learnt over here to develop solutions for edX as well.
- Current Vendors did not necessarily win. The vendors who participated went on contract with the winners.
- Interestingly, http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html?_r=0
This just serves as a reminder that domain specific knowledge might hinder the machine. The take away seems to be that our focus has to be on building the machine, we cannot feed the machine our own understanding.
At the same time, when all the first three method were ensembled into one method, the prediction results were far better. What does this mean? I presume that the domain specific knowledge is very useful at a particular place in the data processing pipeline.
.Looking across the competitions holistically
Boruta seems to be very useful to narrow down our feature selection
Porter Stemming algorithm seems to do a good job of reducing words down to their stems
Model Ensembling results in marginal but significant results
Anything related to computer Vision? Deep Learning. Three libraries are available as well
I am skipping the problems with competitions, primarily because I did not understand many of them.
Oh yes, one big advantage is that the Kaggle network exposes many exotic problem domains to a wide variety of experts. Higgs Boson! Epilepsy!
An Awkward question:
- Apparently, there are several tutorials for The Titanic challenge on Kaggle. One person took the most basic R solution available on the YouTube results, and ended up being ranked 219. His question was, what was wrong with the 1200 people who were behind him?
- Reminds me, that a bulk of people who enter something, are generally of the same level. Once you cross that plateau, you are talking business.
No comments:
Post a Comment