PleIAs is a Paris-based start-up developing open-source language models for document processing and enterprise applications. Last year, PleIAs released the largest open dataset for pretraining language models - CommonCorpus. We also released the first fully open source LLMs (open weights, open code, open data with permissive licences only). In order to create the dataset and train our models, we developed novel open-source tools for data processing and dataset curation. We have done this work in collaboration with leading open-source AI community members such as HuggingFace, Eleuther AI, and the Allen Institute for AI (Ai2).
Fund this organisation