Norconex Importer

Getting Started

Command Line Usage

usage: importer[.bat|.sh]
 -c,--config <arg>            Optional: Importer XML configuration file.
 -e,--contentEncoding <arg>   Optional: The content encoding (charset) of
                              the input file.
 -i,--inputFile <arg>         File to be imported (required unless
                              "checkcfg" is used).
 -k,--checkcfg                Validates XML configuration. When combined
                              with -i, prevents execution on configuration
                              error.
 -o,--outputFile <arg>        Optional: File where the imported content
                              will be stored.
 -r,--reference <arg>         Optional: Alternate unique qualifier for the
                              input file (e.g. URL).
 -t,--contentType <arg>       Optional: The MIME Content-type of the input
                              file.
 -v,--variables <arg>         Optional: variable file.
 

The above Importer launch script is found in the root directory of your installation (where you extracted the Zip file you downloaded). Refer to the Configuration page for documentation on all configuration options. Refer to ConfigurationLoader Javadoc for details on the optional variables file.

Java Integration

If you are using Maven, simply add the project dependency to your pom.xml. If you are not using Maven, you can add all JAR files found in your installation "lib" folder to your application classpath. Configure the Importer class, by passing it a FilesystemCollectorConfig You can build the configuration using java, or by loading an XML configuration file using the ImporterConfigLoader class. Below is a sample code usage:

/* XML configuration: */
//ImporterConfig config = ImporterConfigLoader.loadImporterConfig(
//    myXMLFile, myVariableFile);
 
/* Java configuration: */
ImporterConfig config = new ImporterConfig();
config.setsetTaggers(new IDocumentTagger[]{/* taggers here */});
config.setTransformers(new IDocumentTransformer[] {/* transformers here */});
config.setFilters(new IDocumentFilter[]{/* taggers here */});
 
Importer importer = new Importer(config);
 
File inputFile = ... // the file to be converted
File outputFile = ... // the file that will contain the extracted text
Metadata metadata = new Metadata();
boolean accepted = importer.importDocument(inputFile, outputFile, metadata);
if (accepted) {
    System.out.println("File was imported to : " + outputFile);
} else {
    System.out.println("File was rejected.");
}

Refer to the Importer Javadoc for more documentation or the Configuration page for XML configuration options.

Extend the Importer

To create your own feature implementations, create a new Java project in your favourite IDE. Use Maven or add to your classpath all the files contained in the lib folder of the Importer installation. Configure your project to have its binary output directory to be the classes folder of the importer. Automatically, code created and stored under classes will be picked up by the Importer when you run it.