PDF Document Layout Analysis by HURIDOCS (Human Rights Information and Documentation Systems) - Funding

A Docker-powered microservice for intelligent PDF document layout analysis, OCR, and content extraction. This tool provides advanced PDF layout analysis with VGT and LightGBM models, supporting 150+ languages for OCR, and converting PDFs to multiple formats including Markdown and HTML. Built with Clean Architecture principles to support human rights documentation workflows.