Norconex Apache Solr Committer

Configuration

When used with a Norconex Crawler, you can use the following XML to configure Apache Solr as the <committer> section of your Norconex Crawler configuration (LucidWorks specific instructions further down):

<committer class="com.norconex.committer.solr.SolrCommitter">
  <solrClientType>
     (See class documentation for options. Default: HttpSolrClient.)
  </solrClientType>
  <solrURL>(URL to Solr)</solrURL>
  <solrUpdateURLParams>
    <param name="(parameter name)">(parameter value)</param>
    <-- multiple param tags allowed -->
  </solrUpdateURLParams>
  <solrCommitDisabled>[false|true]</solrCommitDisabled>
 
  <!-- Use the following if BASIC authentication is required. -->
  <username>(Optional user name)</username>
  <password>(Optional user password)</password>
  <!-- Use the following if password is encrypted. -->
  <passwordKey>(the encryption key or a reference to it)</passwordKey>
  <passwordKeySource>[key|file|environment|property]</passwordKeySource>
 
  <sourceReferenceField keep="[false|true]">
     (Optional name of field that contains the document reference, when 
     the default document reference is not used.  The reference value
     will be mapped to Solr "id" field, or the "targetReferenceField" 
     specified.
     Once re-mapped, this metadata source field is 
     deleted, unless "keep" is set to true.)
  </sourceReferenceField>
  <targetReferenceField>
     (Name of Solr target field where the store a document unique 
     identifier (idSourceField).  If not specified, default is "id".) 
  </targetReferenceField>
  <sourceContentField keep="[false|true]">
     (If you wish to use a metadata field to act as the document 
     "content", you can specify that field here.  Default 
     does not take a metadata field but rather the document content.
     Once re-mapped, the metadata source field is deleted,
     unless "keep" is set to true.)
  </sourceContentField>
  <targetContentField>
     (Solr target field name for a document content/body.
      Default is: content)
  </targetContentField>
  <commitBatchSize>
     (Maximum number of docs to send Solr at once. Will issue a Solr 
      commit unless "solrCommitDisabled" is true)
  </commitBatchSize>
  <queueDir>(optional path where to queue files)</queueDir>
  <queueSize>(max queue size before sending to Solr)</queueSize>
  <maxRetries>(max retries upon commit failures)</maxRetries>
  <maxRetryWait>(max delay in milliseconds between retries)</maxRetryWait>
</committer>

LucidWorks Additional Configuration

To make this committer work with LucidWorks, you have to define constant values expected by LucidWorks. When using a Norconex Crawler or Importer, you can define them as an importer parse handler like this:

<tagger class="com.norconex.importer.tagger.impl.ConstantTagger">
  <constant name="data_source">927df3075b544785892c6b4c51625714</constant>
  <constant name="data_source_type">Web</constant>
  <constant name="data_source_name">Wikipedia</constant>
</tagger>

In the committer settings, you also need to add the following configuration:

<solrUpdateURLParams>
  <param name="fm.ds">927df3075b544785892c6b4c51625714</param>
</solrUpdateURLParams>

For more details on Solr Committer integration Lucidworks, you can read the article Using Norconex Web Crawler with LucidWorks.