Can Probabl.ai Catalyze Open-Source Innovation in Machine Learning?

In the fast-evolving landscape of artificial intelligence (AI) and machine learning, Probabl.ai wants to be a beacon of Europe's open-source innovation ambitions.

Founded by tech veteran Yann Lechelle, the company's mission is to boost European efforts to establish sovereignty in AI. The foundation of that work: Probabl.ai is the official operator of scikit-learn, one of the most widely-used machine learning libraries globally.

Speaking at Apiday's Generation AI conference in Paris last week, Lechelle shared his analysis of Europe's competitive challenges as well as his vision for how Probabl.ai hopes to redefine the paradigms of open-source software, enterprise AI, and data science.

"AI should be a white box, not a black box," he said "It should be used on our own devices, and so you don't need to get a PhD to understand what it can do for you."

Mission Possible?

Probabl.ai’s tagline, “Own Your Data Science,” encapsulates its core philosophy.

In a world where proprietary technologies dominate, Probabl.ai champions open-source principles to empower businesses and individuals. The company wants to democratize access to robust and transparent machine learning tools, enabling users to take full ownership of their data and analytics workflows, Lechelle said.

Scikit-learn, at the heart of Probabl.ai’s operations, exemplifies this mission.

The open-source machine learning library for Python was originally developed by a French data scientist in 2007. Scikit-learn is celebrated for its versatility, accessibility, and performance. With over 45 million downloads per month and a million dependencies on GitHub, it is a cornerstone for data scientists and machine learning practitioners globally.

Over the years, it has been managed by INRIA, France’s National Institute for Research in Digital Science and Technology. However, INRIA also needed to raise money for the resources needed to maintain the project.

In May 2022, the French government asked Inria to develop and maintain a set of state-of-the-art open software applications covering the entire data and model cycle. This mission gave rise to P16, a project structured around public-private collaboration. P16 was intended to develop and maintain a set of sovereign digital commons for artificial intelligence.

Probabl.ai emerged from that effort and is a kind of spinoff from INRIA. Earlier this year, Probabl.ai contracted with INRIA to manage scikit-learn and encourage the development of even more commercial uses – as a way to also ensure the public element of the open-source code.

Probabl.ai is a dual-structured company—a mission-driven organization and a for-profit enterprise. By offering services and products on top of its open-source foundation, similar to Red Hat’s model, Probabl.ai ensures sustainability while adhering to its ethos of transparency and accessibility.

Lechelle, the former CEO of Scaleway, was tapped to lead the new commercial element. He has long been a staunch advocate for Europe’s technological sovereignty, emphasizing the need to reduce dependency on U.S. tech giants.

"I eat AI for breakfast, like some of us here in the room, but I'm feeling the crunch," Lechelle said. "I think there's no single human being able to comprehend how fast this is going, and it's going across so many companies across the world."

Technology and Offerings

Earlier this year, the company raised €5.5 million from investors such as Mozilla Ventures and Apertu Capital, alongside prominent technologists and community leaders.

As an open-source operator, Probabl.ai generates revenue by providing premium services, enterprise solutions, and certifications. This approach not only supports the company’s growth but also strengthens the scikit-learn ecosystem, benefiting the broader data science community, Lechelle said.

The business plan for Probabl.ai focuses on enhancing scikit-learn and developing companion tools for machine learning professionals.

Among its products is a tool aimed at augmenting data scientists' workflows in the pre-MLOps phase. This product is designed to address the growing need for streamlined machine-learning pipelines that are both frugal and efficient. Probabl.ai also has created a scikit-learn certification program, crafted by core contributors to the library, to set a benchmark for excellence in data science, equipping professionals with industry-standard skills.

The company recently acquired Mnemotix, a firm specializing in machine learning from complex data sources, to bolster its talent pool and problem-solving capabilities and deliver technology to enterprise clients.