Learning from Data: The Two Cultures

In his influential paper Statistical Modeling: The Two Cultures, written in 2001, Leo Breiman identified and contrasted two approaches to statistical modeling: one that assumes there is a probabilistic model generating the data--the data modeling culture--and another that focuses on mapping inputs to outputs through a black-box--the algorithmic modeling culture. Twenty years later, there is a growing community of researchers working on methodologies embracing both cultures. However, when looking at the broader problem of learning from data, which statistical modeling is an approach to, we can identify two cultures by two separate communities. The first is the statistical modeling culture itself, which starts with a question and/or data. The second, which is driving a lot of the AI breakthroughs, is thetask modeling culture, which corresponds to a task-first approach. We revisit Breiman’s take on statistical modeling and highlight some of the works embracing the two cultures he identified. We then discuss task modeling, highlighting how the failure modes in this culture can be addressed by adopting principles and practices from statistical modeling, e.g. careful data selection and experimental design.

Adji Bousso Dieng

Adji Bousso is a Senegalese computer scientist and statistician working in the field of artificial intelligence. She received her PhD in Statistics from Columbia University where she was advised by David Blei and John Paisley. Her doctoral work, at the intersection of probabilistic graphical modeling and deep learning, received many recognitions, including a Google PhD Fellowship in Machine Learning. Dieng is the founder of The Africa I Know, a research scientist at Google AI, and an incoming tenure-track assistant professor of computer science at Princeton University.