Whenever I see this excellent utility I mention yacy - elastisearch. Organize your entire document collection, host a local search engine, Join a defederated cluster. Installs from binary, little maintenance, scheduled crawls.
Uses wkhtmltopdf to create pdf snapshots of everything it parses.