Norconex HTTP Collector 1.1 Released

Norconex just released version 1.1 of HTTP Collector, its free web crawler.   This is an important upgrade from the Norconex Development Team, giving you the following great new features and enhancements:

  • Much faster and more constant crawling performance, especially with high volume (millions).
  • Support for sitemap.xml and sitemap index (plain or gzip).
  • Support for BASIC and DIGEST authentication.
  • Support for in-page robot instructions.
  • Support for ftp:// URLs.

To see a complete list of changes, see the Release Notes.

This release also takes advantage of the new 1.1 release of Norconex Importer, adding the ability to extract parts of documents using regular expression, and store those as document metadata for indexing (like the content of H1, H2 tags, or bold tags, for influencing the ranking).

We would love to hear your feedback on this release and the features you would like to see implemented next.

Download Norconex HTTP Collector 1.1 now!

Pascal Essiembre has been a successful Enterprise Application Developer for several years before founding Norconex in 2007 and remaining its president to this day. Pascal has been responsible for several successful Norconex enterprise search projects across North America. Pascal is also heading the Product Division of Norconex and leading Norconex Open-Source initiatives.