Reproducing 150 Research Papers and Testing Them in the Real World: Challenges and Solutions

After completing the MILEPOST project in 2009, I opened the cTuning.org portal and released into the public domain all my research code, data sets, experimental results, and Machine Learning models (ML) for our self-optimizing compiler. My goal was to continue this research and developments as a community effort while crowdsourcing ML training across diverse programs, data sets, compilers, and platforms provided by volunteers. Unfortunately, this project quickly stalled after we struggled to run experiments and reproduce results across rapidly evolving systems in the real world.

This experience motivated me to introduce artifact evaluation at several ACM conferences including CGO, PPoPP, and ASPLOS and learn how to reproduce 150+ research papers. In this talk, I will present numerous challenges we faced during artifact evaluation and possible solutions. I will also describe the Collective Knowledge framework (CK) developed to automate this tedious process and bring DevOps and FAIR principles to research.

The CK concept is to decompose research projects into reusable micro-services that expose characteristics, optimizations, and SW/HW dependencies of all sub-components in a unified way via a common API and extensible meta descriptions. Portable workflows assembled from such plug & play components allow researchers and practitioners to automatically build, test, benchmark, optimize, and co-design novel algorithms across continuously changing software and hardware. Furthermore, the best results can be continuously collected in public or private repositories together with negative results, unexpected behavior, and mispredictions for collaborative analysis and improvement.

I will also present the cKnowledge.io platform to share portable, customizable, and reusable CK workflows from reproduced papers that can be quickly validated by the community and deployed in production. I will conclude with several practical use-cases of the CK technology to improve reproducibility in ML and Systems research and accelerate real-world deployment of efficient deep learning systems from the cloud to the edge in collaboration with General Motors, Arm, IBM, Intel, Amazon, TomTom, the Raspberry Pi foundation, ACM, MLCommons, and MLPerf. 

Grigori Fursin

Grigori Fursin is a computer scientist with more than 20 years of experience pioneering novel autotuning, machine learning and knowledge sharing techniques to modernize the development of efficient software and hardware. After completing a PhD in Computer Science from the University of Edinburgh, Grigori was a tech lead in the EU MILEPOST project with IBM developing the world’s first ML-based compiler, a Senior Research Scientist at INRIA, and a Co-Director of the Intel Exascale Lab. He is a recipient of the ACM CGO'17 Test of Time award, INRIA award of scientific excellence, EU HiPEAC technology transfer award, and several best paper awards.

Grigori is the President of the cTuning foundation and the founder of the cKnowledge.io platform. He is an active open-source contributor, educator, and reproducibility champion, notably through his involvement in the ACM Taskforce on Reproducibility, MLCommons and artifact evaluation. He is the author of the Collective Knowledge framework to bring DevOps and FAIR principles to research with the help of portable, customizable, and reusable workflow templates, reproducible experiments, and auto-generated “live” papers. Grigori's mission is to bridge the growing gap between academic research and industry by helping researchers share their novel techniques as production-ready workflows that can be quickly validated in the real world and adopted by industry.