Apache Tika

Thu Jul 18 15:48:29 2024

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Tika @ GitHub.

Links per page

Filters