Search Relevancy: Art or Science? (Part 1)

You want your search users to find what they’re looking for on their first try, but just can’t seem to make it happen. You’ve followed your search engines official documentation to the letter, you’ve read up on all of the best practices you could find, you’ve tried several different search configurations … yet, you just can’t seem to find the “perfect” relevancy formula for your enterprise search implementation. You’re now at the point where you’re about to give up and tell your team to start looking into a different search engine altogether. Wait! Is your search engine really the problem?

More often than not, search engines get blamed for problems that are not fully theirs to solve. What if the first step towards regaining your sanity was to accept the unacceptable: you just can’t get a perfect search relevancy solution. I know, I know – you’ve been told a completely different story from your enterprise search vendor. Vendor statements aside, maybe it’s time to consider the idea that relevancy alone is not the best way for your users to find what they’re searching for. Even with an integrated search engine “mind reader”, search engine relevancy wouldn’t be perfect. Okay, a precise field based search on ISBN numbers can be “perfect”. I’m talking about an average enterprise search here (if such a thing exists) with multiple data sources being used, all searchable using free form text.

Perfect Search Relevancy is a Myth

As much as I hate telling people what they can’t do, I’ll risk it here. There are many reasons why you just can’t achieve perfect search relevancy for your enterprise search solution. This doesn’t, however, mean that you shouldn’t try to get as close to it as you can. Looking into what is keeping you from reaching perfect search relevancy will help you better understand your overall search environment, and help you work towards alternative solutions that will better serve your users.

So what makes perfect search relevancy impossible? Glad you asked. Here are six of the most common causes of search relevancy deficiencies in enterprise search implementations.

Relevancy Problem 1: Terms with Different Meanings

You’re probably familiar with this scenario. A user tries to search for content using search query terms that can have several different meanings, based on the context the terms are used in. Unless you impose a context for the user, their search results could yield quite a few false positives. A “window” could be interpreted to represent something made of glass that’s part of a house, a time slot for one to react, or an operating system. Along these same lines, you could have the opposite occur where multiple terms in your index could represent the same thing. Different meanings for given terms can easily pollute user search results, greatly impacting perceived search relevancy.

Relevancy Problem 2: Differing User Expectations

Consider a case where two users are performing a search for a quotation. They’re both thinking about the exact same quotation. The first user wants to find the name of the book he once read that has the quote. The second user wishes to know who said the quote. In this case one user wants to see a book title first in the result set, whereas the other user wants to see a person’s name first. Although both users are searching using the same initial search criteria, they’re expectation of what the search result should contain varies greatly. These differing expectations again impact perceived search relevancy.

Relevancy Problem 3: Not all Content is Equal

So your organization has decided to give more weight to documents where matches occur in specific fields, such as a Dublin Core “subject” field. What if the content located in one of your data sources doesn’t have any “subject” fields? When this content is indexed and made searchable, it risks scoring quite low in terms of search relevancy.

Relevancy Problem 4: Varying Content Quality

Similar to problem 3, if document meta-data or fields are not populated properly or accurately, you may decide to give more weight to a field that’s not always there, that has an invalid format, or worse, has had its meta-data cut and paste from a different content page altogether (even if it doesn’t apply). Entire websites could re-use the same meta-data in their headers, meaning the relevancy of search results attained by performing searches against weighted fields could be quite skewed.

Relevancy Problem 5: Content Evolution

The relevancy rules you currently have in place may not stand the test of time. Content will be added to, and removed from, your search index over time. Are the relevancy rules you put in place for older content still accurate for new content?

Relevancy Problem 6: Business Expectations

Users are looking for direct, easy to find answers. As a result of certain internal business decisions within your organization, you might decide to tweak the relevancy of certain results to influence what gets shown to users. Your reasons for doing this may include increasing new product visibility, promoting newer documents, pushing select marketing materials, etc. Business decisions responsible for adjusting search relevancy might conflict with what your end users naturally expect from your search implementation.

Relevancy Tuning Options

Now that we’ve seen several search problem areas that could greatly impact search relevancy, what do we do to fix it? There are of course many ways to improve search result relevancy within your enterprise search implementation, thereby increasing the odds of your users finding the exact content they’re looking for. From advanced search features to data clean-up, data normalization to facets, there are several extremely useful enterprise search features that once implemented, will definitely help make your search users happy. I’ll explore these options in my upcoming Part 2 of this post.

Pascal Essiembre has been a successful Enterprise Application Developer for several years before founding Norconex in 2007 and remaining its president to this day. Pascal has been responsible for several successful Norconex enterprise search projects across North America. Pascal is also heading the Product Division of Norconex and leading Norconex Open-Source initiatives.