Norconex HTTP Collector 1.2 Released

Norconex HTTP CollectorNorconex just released version 1.2 of Norconex HTTP Collector, its open-source web crawler.  Along with it comes a complete product web site redesign and a new logo: a lovely web crawling spider wearing a Norconex hat.

Some changes in this feature release:

  • New optional Mongo URL Database implementation.
  • New optional TikaURLExtractor class providing an alternate URL extraction mechanism based on Apache Tika HTMLParser.
  • New SegmentCountURLFilter class for filtering URLs having a specified number of segments (can check duplicate segments too).
  • Configuration samples now point to Norconex test pages to ensure their stability.

To view a complete list of changes, read the Release Notes.

This release also takes advantage of the new 1.1.0 release of Norconex Committer, which simplifies making your own committer implementations.

As always, we welcome your feedback.

Download Norconex HTTP Collector 1.2 now!

Pascal Essiembre has been a successful Enterprise Application Developer for several years before founding Norconex in 2007 and remaining its president to this day. Pascal has been responsible for several successful Norconex enterprise search projects across North America. Pascal is also heading the Product Division of Norconex and leading Norconex Open-Source initiatives.