Norconex just released version 1.1 of HTTP Collector, its free web crawler. This is an important upgrade from the Norconex Development Team, giving you the following great new features and enhancements:
- Much faster and more constant crawling performance, especially with high volume (millions).
- Support for sitemap.xml and sitemap index (plain or gzip).
- Support for BASIC and DIGEST authentication.
- Support for in-page robot instructions.
- Support for ftp:// URLs.
To see a complete list of changes, see the Release Notes.
This release also takes advantage of the new 1.1 release of Norconex Importer, adding the ability to extract parts of documents using regular expression, and store those as document metadata for indexing (like the content of H1, H2 tags, or bold tags, for influencing the ranking).
We would love to hear your feedback on this release and the features you would like to see implemented next.
Download Norconex HTTP Collector 1.1 now!