Norconex just released an Amazon CloudSearch Committer module for its open-source crawlers (Norconex “Collectors”). This is an especially useful contribution to CloudSearch users given that CloudSearch does not have its own crawlers.
If you’re not yet familiar with Norconex Collectors, head over to the Norconex Collectors website to see what you’ve been missing.
Assuming you’re already familiar with Norconex Collectors, you can enable CloudSearch as your crawler’s target search engine by following these steps:
- Download the CloudSearch Committer.
- Extract the zip, and copy the content of the “lib” folder to the “lib” folder of your existing Collector installation.
- Add this minimum required configuration snippet to your Collector configuration file:
<committer class="com.norconex.committer.cloudsearch.CloudSearchCommitter"> <serviceEndpoint>(CloudSearch service endpoint)</serviceEndpoint> <accessKey> (Optional CloudSearch access key. Will be taken from environment when blank.) </accessKey> <secretKey> (Optional CloudSearch secret key. Will be taken from environment when blank.) </secretKey> </committer>
- The document endpoint represents the CloudSearch domain you’ll want to use to store your crawled documents. It can be obtained from your CloudSearch domain’s main page.
As for the AWS access and secret keys, they can also be stored outside the configuration file using one of the methods described here.
The complete list of configuration options is available here.
For further information:
- Visit the Norconex CloudSearch Committer website
- Visit the Norconex HTTP Collector website
- Get help or report issues
- Contact Norconex directly