Blogs: Tika

Blog posts filtered by the Tika subject tag.

Browse blogs by subject

BACKGROUND Nearly two and a half years ago, I started an effort for Apache Tika™ to help improve its robustness via TIKA-1302.  Apache Tika™ is an umbrella/wrapper project that “detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).” I documented some of the early work […]

By tallison, posted in tallison's Blog

4th Oct 2016  3:03 PM  1587 Reads  No comments