2020 – Norconex Inc

This year, the annual KMWorld Conference took place from November 16–19 as a virtual event under the title “KMWorld Connect 2020.” It included the co-hosted conferences Enterprise Search & Discovery, Taxonomy Boot Camp, and Text Analytics Forum. I was privileged to join the conference for the first time.

Organizing one of the largest knowledge management conferences online must have been an endeavor. The web conferencing platform, PheedLoop, allowed participants to attend sessions from the four conferences as they happened and to chat and answer attendees’ questions at virtual booths. Audience questions appeared in real-time during presentations, and presenters were able to answer these questions at the end of the talk.

Obviously, one of the major shortcomings of online conferences is the lack of live face-to-face communication. Despite the virtual nature of the event, the quality of the content was above my expectations.

I would like to touch on some of the topics covered at the conference.

Taxonomy & Ontology

There have been great advancements lately in the knowledge management industry in taxonomies. As stated in many presentations, applied taxonomies have become commonplace at enterprises and in many cases have progressed to more complex knowledge organization systems such as ontologies. According to Wikipedia, “an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains.”

Knowledge Graph & Graph DB

Probably every fourth presentation at KMWorld mentioned or presented a business case using a knowledge graph term. Sometimes this has been referred to as an enterprise knowledge graph or EKG—not to be confused with the abbreviation for electrocardiogram—which reflects the industry enthusiasm for knowledge graphs.

In recent years, knowledge graphs have become more accessible to enterprises through advances in technology, specifically in implementing graphs more easily in graph databases, which are now capable of federating different content sources under one roof, whether behind a firewall or in the public domain. It would be appropriate to mention that Norconex has recently made available its new open-source crawlers for Neo4j—one of the larger names in the field of graph databases. Here you can find an example of Norconex’s crawlers being used to import wine varietal data from the web into a Neo4j graph database.

Semantic Search & ML

Ontologies implemented as knowledge graphs are key enabling technologies behind semantic search. Introduced by Google and currently getting increasing traction at enterprises, semantic search is a search method that infers user intent from context and content to generate and rank search results. A semantic search–capable system provides results that are relevant to the search phrase. The context of the searched words, combined with the content and context of the user browsing history and user profile, helps the search engine decide the result that best satisfies the query. I like an example illustrating the semantic search that popped up during one of the panel discussions. Two seemingly very close terms—“black dress” and “black dress shoes”—produce totally different results when searched on Google. This task is not easily achievable with a regular keyword-based search technique.

The recent advances in machine learning have considerably improved the abilities of algorithms to analyze text and other types of unstructured content. Creative use of advanced machine learning techniques has proven effective for supplementing semantic search. There were a few interesting presentations at KMWorld covering this topic to a great extent.

____________

Overall, KMWorld Connect 2020 hosted a great deal of case studies, interesting discussions, and amazing insights that included introduction of new resources, sharing of tools and strategies, learning from colleagues, and much more.

Norconex looks forward to participating in the event next year.

This year I was given the privilege to attend my first KubeCon + CloudNativeCon North America 2020 virtually. This event spans four days consisting of virtual activities such as visiting vendor booths, learning about Cloud Native projects, and exploring the advancement of cloud native computing.

The keynote started off by paying respects to the passing of the legendary Dan Kohn. Kohn’s influence has changed how we do online shopping to research on the internet and made ways for the new evolutions of The Linux Foundation and Cloud Native Computing Foundation for an exciting future for many generations to come, while supporting the creation of sustainable open source ecosystems.

There were glitches while live streaming from the virtual conference platform, which was to be expected due to the real-time heavy load test that is not desirable in any production environments. Fortunately, on-demand recordings of the presentations are now available.

Slack channels can be joined from cloud-native.slack.com to communicate with others from channels like #kubecon-mixandmingle and other KubeCon-related topics. This feature provides a great way to connect with the KubeCon audience virtually even after the event is over.

KubeCon provides many 101 learning and tutorial events about the service CNCF projects offer and how it can help with the 3 main pillars that I am involved with daily: automation, dev-ops, and observability. Each of the pillar implementations are usually done in parallel, for instance, continuous development and continuous deployment required automation in building the pipeline, involves creating codes and having knowledge in operations architecture planning. Once deployed, the observability of the services running would be required to monitor for smooth services deliver to users. Many of the projects from the CNCF provide the services front to help create the development flow from committing code that gets deployed into the cloud services and providing monitoring capabilities to the secured mesh services.

At Norconex, our upcoming Norconex Collector version 3.0.0 could be used with the combination of Containerd, Helm, and Kubernetes and with automating the build and deployment via Jenkins. One way to get started is to figure out how to package the Norconex Collector and Norconex Committer into a container-runnable image with container tools such as Docker to run builds for development and testing. After discerning how to build the container image, I have to decide where to deploy and store the container image registry so that the Kubernetes cluster can pull the image from this registry and run the container image with Kubernetes Cronjob based on a schedule when the job should run. The Kubernetes Job would create Pod to run crawl using the Norconex Collector and commit indexed data. Finally, I would choose Jenkins as the build tool for this experiment to help to automate updates and deployments.

Below are steps that provide an overview for my quick demo experiment setup:

Demo use of the default Norconex Collector:
- Download | Norconex HTTP Collector with Filesystem Committer. The other choices of Committers can be found at Norconex Committers
- Build container image using Dockerfile
- Setup a Git repository file structure for container image build
- Guide to build and test-run using the created Dockerfile
  - Demo set up locally using Docker Desktop to run Kubernetes
    - Tutorials for setting up local Kubernetes
Determine where to push the container image; can be public or private image registry such as Docker Hub
- Demo will use Docker Hub public registry
  - https://hub.docker.com/repository/docker/somphouang/norconex-devops-demo
Create a Helm Chart template using the Helm Chart v3
- Demo will start with default template creation of Helm Chart
  - Get the Helm tool here: Helm | Installing Helm
- Demo to use the Kubernetes Node filesystem for persistent storage
  - Other storage options can be used, for instance, in AWS use EBS volume or EFS
- Helm template and yaml configuration
  - cronjob.yaml to deploy Kubernetes Cronjob that would create new Kubernetes job to run on schedule
  - pvc.yaml to create Kubernetes persistent volume and persistent volume claim that the Norconex Collector crawl job will use on the next recrawl job run
Simple build using Jenkins
- Overview of Jenkins build job pipeline script

I hope you enjoyed this recap of Kubecon!

More details of the codes and tutorials can be found here:

https://github.com/somphouang/norconex-devops-demo

Covid-19 has affected almost every country around the globe and has left everyone looking for the latest information. Below are just some of those who are searching for data:

Government agencies trying to manage information for the public
Healthcare organizations trying to keep abreast of the latest research
Businesses looking for the latest updates on government subsidies and how to properly plan and prepare to reopen
Parents following information on school closures and how to keep their families safe
Individuals staying home and trying to navigate through the constant updates and search for products that have become harder to source during the outbreak

For these scenarios and so many more, all of those searching need to be able to access the most current and relevant information.

Norconex has assisted with a couple of projects related to the coronavirus outbreak, so we wanted to share the details for one of those projects.

Covid-19 Content Monitor

Right before Covid-19 emerged, Norconex had built a search testbed for the Canadian federal government departments. The testbed application was used to demonstrate the many features of a modern search engine and how they can be applied to search initiatives across the Government of Canada. As part of this initiative, for Health Canada we had implemented the search using data related to health and safety recalls.

When Covid-19 hit, it became more important than ever for Health Canada to ensure that the government disseminates accurate and up-to-date information to the Canadian population. Each department has the ongoing responsibility to properly inform its audience, efficiently share new directives and detail how the virus impacts department services. This raised some questions. How do you validate the quality of information shared with the public across various departments? How do you ensure a consistent message?

Norconex was happy to answer when asked for a quick solution to facilitate a remedy for these issues.

By building upon the pre-existing testbed, Norconex developed a search solution that crawls the relevant data from specific data sources. Health Canada employees can search through all data using various faceting options to help find the desired data. The data is then provided back in a fast, simple-to-use interface. The solution monitors content in both of Canada’s official languages across all departments. Among its time-saving features, the search tool offers the following:

Automated classification of content
Continuous detection of new and updated content
Easy filtering of content
Detection of “alerts” found in pages so alerts can be verified more frequently to ensure continued relevance

This search and monitoring tool is currently being hosted for free on the Norconex cloud and being accessed by the team at Health Canada daily, saving precious time as they gather the information needed to help keep Canadians safe.