Exploring Norconex Commons Lang

CodeNorconex Commons Lang is a generic Java library providing useful utility classes that extend the base Java API.  Its name is shamelessly borrowed from Apache Commons Lang, so people can quickly assume what it’s about just by its name.   It is by no means an effort to replace Apache Commons Lang. Quite the opposite.  We try to favor Apache Commons libraries whenever possible.   Norconex uses this Commons Lang library as a catch-all, providing all kinds of generic utilities, some of which have extra dependencies over the base Java API.  While this library is used by Norconex in its enterprise search projects, it is not tied to search and can be used in any context.

The following explores some of the key features it offers as of this writing.


Testing Multiple Objects for Equality

When you have to test multiple objects to see if at least one, all, or none matches, like this:

if (sourceObject.equals(obj1)
    || sourceObject.equals(obj2)
    || sourceObject.equals(obj3)
    || sourceObject.equals(obj4)) {
        // do whatever
}

Consider using com.norconex.commons.lang.EqualsUtil, which offers the following methods:

EqualsUtil.equalsAny(Object source, Object... targets)
EqualsUtil.equalsAll(Object source, Object... targets)
EqualsUtil.equalsNone(Object source, Object... targets)

With the same objects from above sample, testing for multiple objects using EqualsUtil would look like this:

if (EqualsUtil.equalsAny(sourceObject, obj1, obj2, obj3, obj4)) {
     // do whatever
}

Nicer Ways to Sleep

There are times you may find it annoying to perform try/catch whenever you call Thread.sleep(…).  Just as annoying is sometimes having to convert every delay you want in milliseconds.  You can now use the com.norconex.commons.lang.Sleeper class.   It catches the checked exception and returns an unchecked one: SleeperException. It also offers a bunch of convenience methods to deal with the sleeping of threads in a friendlier way:

Sleeper.sleepHours(int hours)
Sleeper.sleepMillis(long millis)
Sleeper.sleepMinutes(int minutes)
Sleeper.sleepNanos(long nanos)
Sleeper.sleepSeconds(int seconds)

For instance, waiting for three minutes becomes as simple as:

Sleeper.sleepMinutes(3);

Printing Formatted Durations

Have you ever tracked the elapsed time of a given process and wanted to display it nicely to your users?  The com.norconex.commons.lang.time.DurationUtil class, gives you just that:

DurationUtil.formatLong(Locale locale, long duration)
DurationUtil.formatLong(Locale locale, long duration, int maxUnits)
DurationUtil.formatShort(Locale locale, long duration)
DurationUtil.formatShort(Locale locale, long duration, int maxUnits)

DurationUtils offers two formats for printing a duration: short and long.  It will display the duration as plain text or abbreviation given the locale you provide (only English and French for now).  You can also specify how many time units you want to display (in case you just want to show the most significant ones).  For example, below are a few ways to display a duration:

long myDuration = 253108883000;

String enShort = DurationUtil.formatShort(Locale.ENGLISH, myDuration);
// enShort contains “54d18h1m23s”

String frLong = DurationUtil.formatLong(Locale.CANADA_FRENCH, myDuration, 2);
// frLong contains “54 jours 18 heures”

Not shown here, but it also takes care of printing time units in their plural or singular form depending on the case.


String-based Map/Properties with Typed Accessors

Have you ever had to deal with Maps or Properties used to store String representation of values of different data types (integer, date, etc.)?    Think about processing HTTP parameters with numeric arguments, for instance (when not using a framework mapping those for you).  If you have, you know how annoying it is to always convert these Strings back and forth to their native format.

Norconex Commons Lang brings you its own com.norconex.commons.lang.map.Properties class that will make you smile.  It can be a replacement for the Java Properties class where appropriate, or it can be used as a multi-value String Map.

Like Java Properties, it enforces the use of String keys and values, but it offers many methods for storing and retrieving those to/from the type of your choice.

You can also use it to store and retrieve application settings quickly since, like Properties, it offers “store” and “load” methods for file persistence.  Here is an example:

// Easily access query string parameters
Properties params = new Properties(request.getParameterMap());
long clientId = params.getLong("clientId"); // defaults to zero
int numOfWives = params.getInt("numOfWives", 1); // 1 if null
List<File> caseFiles = params.getFiles("caseFile"); // returns a list

// Put some values
params.setBoolean("isHappy", true); // Replace a value
params.addDate("now", new Date()); // Add a value (to list)
params.addFloat("scores", 1.4f, 5.2f, 6.7f, 10.24f); // multi-value

// Save to file
params.store(new FileWriter(customerFile), "Customer Profile");

It can be used in all kinds of context, but if your goal is really to deal with HTTP request parameters, you definitely want to read the next section.


Dealing with URLs

The Norconex Commons Lang library offers various URL-related utilities.

URL Query String Manipulation:

The com.norconex.commons.lang.url.QueryString class is a subclass of com.norconex.commons.lang.map.Properties with a few additions:

  • You can pass a URL in its constructor to have its query string automatically parsed and made easily accessible/modifiable.
  • You can decide to set or replace the query string to an existing URL with the #applyOnURL(String) method.
  • The #toString() method has been overridden to return a URL-encoded representation that can be added to any URL.

Let’s say you need to proxy an HTTP request from a new application to a legacy one.   The following code is one way to do it:

String newAppURL = "https://here.com/newapp?id=34&page=21&user=john";
String oldAppURL = "http://there.com/legacyapp";

QueryString qs = new QueryString(newAppURL);
qs.setInt("page", 1);        // sets page back to 1
qs.setString("user", "joe"); // change user name
oldAppURL = qs.applyOnURL(oldAppURL); // new URL with modified query string

URL Manipulation:

Being able to modify the query string is nice, but what if you want to modify parts of the URL itself without forgetting about the query string?  The com.norconex.commons.lang.url.HttpURL class will help you do that. For example:

HttpURL url = new HttpURL("https://here.com/contact?id=jsmith05");
QueryString qs = url.getQueryString(); // do any query string manipulation
url.setProtocol("http");  // render non-secure
url.setHost("there.com"); // change the host
URL newURL = url.toURL(); // new URL ready to use

URL Streaming:

What if you want to invoke the URL obtained in previous example?  Having Java make a URL call is not the most difficult, but it may be too many lines to your taste if you have to redo it over and over.   There are times you would prefer a single method call: “pass a URL, get the page as string”.

Add to this dealing with password-protected URLs as well as the need to go through a proxy being itself password-protected, and you start wish for something simpler!  The com.norconex.commons.lang.url.URLStreamer class offers to do that work for you.  It has several methods to deal with URL streaming in the most straight forward way possible.

A quick example of a password-protected site:

Credentials credentials = new UsernamePasswordCredentials("norconex", "letmein");
String content = URLStreamer.streamToString(
        "http://www.somesite.com/someProtectedPage.html", credentials);

URL Normalization:

There are times you have to compare URLs, store them for later use, or download the files they represent.  We all know a few different URLs may lead to the same resource.  To avoid duplicates, you wish there was a way to modify URLs to a more unified format before using them.  This process is often referred to as “URL Normalization”, and you guessed it — there is a class doing just that: com.norconex.commons.lang.url.URLNormalizer.  It offers a series of standard normalization techniques you may choose to apply:

addTrailingSlash()
addWWW()
decodeUnreservedCharacters()
lowerCaseSchemeHost()
removeDefaultPort()
removeDirectoryIndex()
removeDotSegments()
removeDuplicateSlashes()
removeEmptyParameters()
removeFragment()
removeSessionIds()
removeTrailingQuestionMark()
removeWWW()
replaceIPWithDomainName()
secureScheme()
sortQueryParameters()
unsecureScheme()
upperCaseEscapeSequence()

For instance, let’s assume before making use of a URL, you want to make sure there is no “www” in it, remove its default port if present (80, 443), and sort query parameters:

String url = "https://www.books.com:443/read?chapter=456&page=789&book=123";
String normalizedURL = new URLNormalizer(url)
        .removeWWW()
        .removeDefaultPort()
        .sortQueryParameters()
        .toString();
// normalizedURL holds: "https://books.com/read?book=123&chapter=456&page=789"

Dealing with Text Files and Streams

Norconex Commons Lang provides a few different I/O related classes. They can all be found under the com.norconex.commons.lang.io package.

FilteredInputStream:

The FilteredInputStream class allows you to decorate any text-based InputStream and filter the content it reads to eliminate unwanted lines from it.  This is useful when you need to pass an InputStream to a method, but you want to eliminate some of its returned lines on the fly to avoid having to do a first-pass read of the original InputStream to eliminate those lines (less efficient).  To filter lines, you have to pass your own implementation of IInputStreamFilter, or use the already existing RegexInputStreamFilter.  This example will print only log lines representing issues:

InputStream is = new FilteredInputStream(
        new FileInputStream("/tmp/logfile.log"),
        new RegexInputStreamFilter("^(ERROR|WARNING).*"));
for (String line : IOUtils.readLines(is)) {
    // this only print lines starting with ERROR or WARNING
    System.out.println(line);
}
is.close();

StreamGobbler/IStreamListener:

By default, StreamGobbler simply reads a text InputStream without doing anything with it. It can be useful to prevent an external process from hanging, for instance, because its STDOUT or STDERR has nowhere to go.  Optionally, you can add one or many IStreamListener to it. That listener allows you to be notified of any new line printed and do what’s on your mind with it.  For example:

InputStream is = myStream; // replace by whatever stream;
StreamGobbler gobbler = new StreamGobbler(is);
gobbler.addStreamListener(new IStreamListener() {
    @Override
    public void lineStreamed(String type, String line) {
        if (line.startsWith("person")) {
            // create a new person here
        } else if (line.startsWith("organization")) {
            // create a new organization here
        }
    }
});
is.close();

FileUtil:

FileUtil is a complement to what org.apache.commons.io.FileUtils already provides.  Methods summary:

createDateDirs(…)         // Creates a date-based directory structure from any dates (e.g. /parentDir/2000/12/31/).
createDateTimeDirs(…)     // Like above, but also adds time (e.g. /parentDir/2000/12/31/13/34/12/).
createDirsForFile(…)      // Given a file, this command creates all non-existing parent directories.
deleteEmptyDirs(…)        // Recursively delete all empty directories.
deleteFile(…)             // Robust file deletion.  Exception when failing. Retries on file lock.
from/toSafeFileName(…)    // Writes/read a file name to/from a cross-OS friendly name (safe filename).
head(…)                   // Returns first X lines starting from the beginning of a text file.
tail(…)                   // Returns last X lines starting from the end of a text file.
moveFile(…)               // Robust file moving. Delete first if exists.  Exception when failing. Retries on file lock.
visitAllDirs(…)           // Uses visitor pattern to recursively browse directories.  Saves from scanning yourself.
visitAllDirsAndFiles(…)   // Like above, but browses both files and directories.
visitAllFiles(…)          // Like above, but browses only files.

IOUtil:

IOUtils offers the same head(…) and tail(…) utility methods found on FileUtil, but for InputStreams.  Getting a tail on an InputStream forces IOUtil to read all the stream and can be inefficient.  Always use FileUtil.tail(…) when dealing with files.


XML-based Configuration Helpers

The com.norconex.commons.lang.config package of Norconex Commons Lang focuses on making Configuration easier and more flexible, giving more focus on XML.

ConfigurationLoader:

The ConfigurationLoader class makes it really easy to have template-based configuration files and fragments.  XML files are assumed to be velocity files, and all Velocity features apply (for scripted rendering, variables interpolation, etc.).   If a configuration file has another file with the same name next to it, but with the extension “.properties” (java-style) or “.variables” (key/value), it will automatically load them and make the variables available to the template.  Optionally, you can also specify your own variables file (e.g., to share and re-use variable files).   The resulting construct is an XMLConfiguration object from Apache Commons Configuration.  Sample usage:

File cfgFile = new File("/tmp/myconfig.xml");
File varFile = new File("/tmp/myconfig.variables");
ConfigurationLoader configLoader = new ConfigurationLoader();
XMLConfiguration xml = configLoader.loadXML(cfgFile, varFile);

Because the above example uses the same base file name (minus extension), there is no need to explicitly specify the variables file and the following code is equivalent:

File cfgFile = new File("/tmp/myconfig.xml");
ConfigurationLoader configLoader = new ConfigurationLoader();
XMLConfiguration xml = configLoader.loadXML(cfgFile);

Suppose a myconfig.variables file with the following sample settings:

home=c:appsmyhome
numOfJobs=3

The following is an example of a fake configuration file taking advantage of the above two variables in different ways:

<myconfig>
    <source-dir>${home}/myproject/source</source-dir>
    <target-dir>${home}/myproject/target</target-dir>
  #foreach ($jobId in [1..$numOfjobs])
    <job id="myJob${jobId}">
        <datasetname>dataset-${jobId}</datasetname>
    </job>
  #end
  #parse("/tmp/someOtherConfig.xml")
</myconfig>

IXMLConfigurable:

The IXMLConfigurable interface declares an object as being configurable via XML.  While many may see it as bad design to have data objects take care of loading and saving themselves, when used in the right context, it can save you time and bring several benefits.  The interface provides two methods to implement:

void loadFromXML(Reader in) throws IOException;
void saveToXML(Writer out) throws IOException;

Assume the following XML snippet:

<filter caseSensitive="false" >
    .*mycompany.com.*
</filter>

The loading of that XML to initialize a class could look like this:

@Override
public void loadFromXML(Reader in) {
    XMLConfiguration xml = ConfigurationLoader.loadXML(in);
    setRegex(xml.getString(""));
    setCaseSensitive(xml.getBoolean("[@caseSensitive]", false));
}

ConfigurationUtil

Why chose to implement the above IXMLConfigurable interface?  You get an idea when using it with ConfigurationUtil.   If in your XML configuration file you have a tag that represents the configuration for a given object, simply add the “class” attribute to it with the fully qualified name of your class.  If that class implements IXMLConfigurable, it will automatically create and initialize it with whatever XML is present just for that tag.  For instance, we would add the “class” attribute to the same XML snippet:

<filter class="com.mycompany.RegexURLFilter" caseSensitive="false" >
.*norconex.com.*
</filter>

The transforming of this XML configuration into your object can then be performed like this:

RegexURLFilter filter = ConfigurationUtil.newInstance(xml, "filter");

What Next?

Grab a copy of Norconex Commons Lang now, and let us know what you think!

Get it here: /product/commons-lang/download.html.

Come back to the product page once in a while for new feature releases.

Pascal Essiembre has been a successful Enterprise Application Developer for several years before founding Norconex in 2007 and remaining its president to this day. Pascal has been responsible for several successful Norconex enterprise search projects across North America. Pascal is also heading the Product Division of Norconex and leading Norconex Open-Source initiatives.