tag page

HP Autonomy users, take control over your web crawling. Norconex recently released an HP Autonomy IDOL Committer module for its open-source web crawler, Norconex HTTP Collector. You can now enjoy the features of Norconex crawler and experience the freedom of open-source when crawling your sites for indexing into IDOL. (more…)

Apache Lucene Web Site

For a client last year, we had to upgrade some old Lucene code to Lucene 4. Lucene 4 was a rather large release and there are many aspects to be aware when upgrading non trivial code. Let’s take a look at some of them.

(more…)

Norconex HTTP CollectorNorconex just released version 1.2 of Norconex HTTP Collector, its open-source web crawler.  Along with it comes a complete product web site redesign and a new logo: a lovely web crawling spider wearing a Norconex hat.

Some changes in this feature release: (more…)

CodeNorconex Commons Lang is a generic Java library providing useful utility classes that extend the base Java API.  Its name is shamelessly borrowed from Apache Commons Lang, so people can quickly assume what it’s about just by its name.   It is by no means an effort to replace Apache Commons Lang. Quite the opposite.  We try to favor Apache Commons libraries whenever possible.   Norconex uses this Commons Lang library as a catch-all, providing all kinds of generic utilities, some of which have extra dependencies over the base Java API.  While this library is used by Norconex in its enterprise search projects, it is not tied to search and can be used in any context.

The following explores some of the key features it offers as of this writing. (more…)

2013/06/05

Norconex HTTP CollectorSay hello to Norconex HTTP Collector!  At Norconex, we have always recognized the value open-source brings to software development, and to a greater extent, the world.   It benefits us when building custom solutions for our customers and ourselves.   As long-time consumers of open-source, it is time for us to give back.

As a result, Norconex is proud to announce open-sourcing of a handful of its libraries and products, so that the community can save time and money like it did for us.   The Norconex HTTP Collector is an HTTP Crawler meant to give the greatest flexibility possible for developers and integrators. (more…)