An async, production-ready document ingestion pipeline for RAG systems. Processes PDFs, images, arXiv papers, and documents from S3 through Docling extraction, GLiNER2 metadata enrichment & PII ...