Norconex Importer 1.2.0 was just released along with a new website for it.
- Now support text extraction from WordPerfect documents.
- New transformer to reduce consecutive instances of the same string to only one instance.
- New transformer to perform search and replace on document content using regular expression.
- New filter to exclude/include documents with no data for one or more specified metadata properties.
- Now attempts to detect the character encoding from a character stream by looking at a Content-Type metadata. If none is present, defaults to UTF-8.
Download it now!
Web site: /collectors/importer/