content-analysis
Apache Tika
https://tika.apache.org/
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Added 1 year ago
Apache Tika bindings for PHP
https://github.com/vaites/php-apache-tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats.
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Added 2 years ago