Never-Ending Learning to Read the Web with Tom Mitchell
One of the great technical challenges in big data is to construct computer systems that learn continuously over years, from a continuing stream of diverse data, improving their competence at a variety of tasks, and becoming better learners over time.
This TechTalk describes Carnegie Mellon University's research to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web. Each day NELL extracts (reads) more facts from the web, and integrates these into its growing knowledge base of beliefs. Each day NELL also learns to read better than yesterday, enabling it to go back to the text it read yesterday, and extract more facts, more accurately, today.
NELL has been running 24 hours/day for over three years now. The result so far is a collection of 50 million interconnected beliefs (e.g., servedWith(coffee, applePie), isA(applePie, bakedGood)), that NELL is considering at different levels of confidence, along with hundreds of thousands of learned phrasings, morphological features, and web page structures that NELL has learned to use to extract beliefs from the web. Track NELL's progress at http://rtw.ml.cmu.edu.
Tom M. Mitchell is the E. Fredkin University Professor at Carnegie Mellon University, where he founded the world's first Machine Learning Department. His research uses machine learning to develop computers that are learning to read the web (http://rtw.ml.cmu.edu), and uses brain imaging to study how the human brain understands what it reads. Mitchell is a member of the U.S. National Academy of Engineering, the American Academy of Arts and Sciences, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI). In 2015 he received an honorary Doctor of Laws degree from Dalhousie University for his contributions to machine learning and cognitive neuroscience.