Say hello to Norconex HTTP Collector! At Norconex, we have always recognized the value open-source brings to software development, and to a greater extent, the world. It benefits us when building custom solutions for our customers and ourselves. As long-time consumers of open-source, it is time for us to give back.
As a result, Norconex is proud to announce open-sourcing of a handful of its libraries and products, so that the community can save time and money like it did for us. The Norconex HTTP Collector is an HTTP Crawler meant to give the greatest flexibility possible for developers and integrators. It makes it easy for Java developers to add custom features, so no one will get stuck again when dealing with odd requirements, difficult websites, or close-source crawler limitations. Amongst its other objectives, you will see efforts put towards portability, modular and re-usable configuration, complete documentation, usability for non-programmers, and working with virtually any enterprise search products. The HTTP collector can be used stand-alone or embedded as a library in your own software.
Norconex may release other collectors for various data sources in the future. In the meantime, we have encapsulated the document parsing process and sending of parsed data to your target search engine or repository into two separate libraries. We are releasing them as Norconex Importer and Norconex Committer. These two libraries allow you to easily create your own solutions for crawling your own content repositories without having to worry about parsing different file types.
From customers to integrators, if you deal with enterprise search, we hope you will give our HTTP Collector a try and take time to leave us constructive feedback or contributions.
Official product site: /collectors/collector-http