PleIAs - Project funding

PleIAs is a Paris-based start-up developing open-source language models for document processing and enterprise applications. Last year, PleIAs released the largest open dataset for pretraining language models - CommonCorpus. We also released the first fully open source LLMs (open weights, open code, open data with permissive licences only). In order to create the dataset and train our models, we developed novel open-source tools for data processing and dataset curation. We have done this work in collaboration with leading open-source AI community members such as HuggingFace, Eleuther AI, and the Allen Institute for AI (Ai2).

Fund this organisation

github.com/Pleias

Mon, 26 Jan 2026 12:00:02 UTC

There was a problem with this listing's funding.json manifest. If it is not fixed, the listing will be removed from the portal.

Crawl error

error: https://github.com/Pleias/open_data_toolkit/blob/main/funding.json returned 502