Som Phouangpraseuth – Norconex Inc

This year I was given the privilege to attend my first KubeCon + CloudNativeCon North America 2020 virtually. This event spans four days consisting of virtual activities such as visiting vendor booths, learning about Cloud Native projects, and exploring the advancement of cloud native computing.

The keynote started off by paying respects to the passing of the legendary Dan Kohn. Kohn’s influence has changed how we do online shopping to research on the internet and made ways for the new evolutions of The Linux Foundation and Cloud Native Computing Foundation for an exciting future for many generations to come, while supporting the creation of sustainable open source ecosystems.

There were glitches while live streaming from the virtual conference platform, which was to be expected due to the real-time heavy load test that is not desirable in any production environments. Fortunately, on-demand recordings of the presentations are now available.

Slack channels can be joined from cloud-native.slack.com to communicate with others from channels like #kubecon-mixandmingle and other KubeCon-related topics. This feature provides a great way to connect with the KubeCon audience virtually even after the event is over.

KubeCon provides many 101 learning and tutorial events about the service CNCF projects offer and how it can help with the 3 main pillars that I am involved with daily: automation, dev-ops, and observability. Each of the pillar implementations are usually done in parallel, for instance, continuous development and continuous deployment required automation in building the pipeline, involves creating codes and having knowledge in operations architecture planning. Once deployed, the observability of the services running would be required to monitor for smooth services deliver to users. Many of the projects from the CNCF provide the services front to help create the development flow from committing code that gets deployed into the cloud services and providing monitoring capabilities to the secured mesh services.

At Norconex, our upcoming Norconex Collector version 3.0.0 could be used with the combination of Containerd, Helm, and Kubernetes and with automating the build and deployment via Jenkins. One way to get started is to figure out how to package the Norconex Collector and Norconex Committer into a container-runnable image with container tools such as Docker to run builds for development and testing. After discerning how to build the container image, I have to decide where to deploy and store the container image registry so that the Kubernetes cluster can pull the image from this registry and run the container image with Kubernetes Cronjob based on a schedule when the job should run. The Kubernetes Job would create Pod to run crawl using the Norconex Collector and commit indexed data. Finally, I would choose Jenkins as the build tool for this experiment to help to automate updates and deployments.

Below are steps that provide an overview for my quick demo experiment setup:

Demo use of the default Norconex Collector:
- Download | Norconex HTTP Collector with Filesystem Committer. The other choices of Committers can be found at Norconex Committers
- Build container image using Dockerfile
- Setup a Git repository file structure for container image build
- Guide to build and test-run using the created Dockerfile
  - Demo set up locally using Docker Desktop to run Kubernetes
    - Tutorials for setting up local Kubernetes
Determine where to push the container image; can be public or private image registry such as Docker Hub
- Demo will use Docker Hub public registry
  - https://hub.docker.com/repository/docker/somphouang/norconex-devops-demo
Create a Helm Chart template using the Helm Chart v3
- Demo will start with default template creation of Helm Chart
  - Get the Helm tool here: Helm | Installing Helm
- Demo to use the Kubernetes Node filesystem for persistent storage
  - Other storage options can be used, for instance, in AWS use EBS volume or EFS
- Helm template and yaml configuration
  - cronjob.yaml to deploy Kubernetes Cronjob that would create new Kubernetes job to run on schedule
  - pvc.yaml to create Kubernetes persistent volume and persistent volume claim that the Norconex Collector crawl job will use on the next recrawl job run
Simple build using Jenkins
- Overview of Jenkins build job pipeline script

I hope you enjoyed this recap of Kubecon!

More details of the codes and tutorials can be found here:

https://github.com/somphouang/norconex-devops-demo

This was my first year joining the open-road Elastic{ON} Tour 2019 event in Toronto on September 18, 2019. My experience at this event was fully charged with excitement from meeting with Elastic architects, operations folks, security pros, and developers alike.

The event was hosted at The Carlu in downtown Toronto. In the morning, the opening keynote was presented by Nick Drost, Senior Director of Elastic, on search solutions such as app search, site search, and enterprise search, security using SIEM, and more. One of the most exciting keynote updates was about using Elastic Cloud on Kubernetes to help simplify processes of deployment, security, scaling, upgrades, snapshots, and high availability.

The next presenter, Michael Basnight, Software Engineer at Elastic, provided an Elastic Stack roadmap with demos of the latest and upcoming features. Kibana has added new capabilities to become much more than just the main user interface of Elastic Stack, with infrastructure and logs user interface. He introduced Fleet, which provides centralized config deployment, Beats monitoring, and upgrade management. Frozen indices allows for more index storage by having indices available and not taking up HEAP memory space until the indices are requested. Also, he provided highlights on Advanced Machine Learning analytics for outlier detection, supervised model training for regression and classification, and ingest prediction processor. Elasticsearch performance has increased by employing Weak AND (also called “WAND”), providing improvements as high as 3,700% to term search and improving other query types between 28% and 292%.

Another added feature to Elasticsearch stack is advanced scoring to help boost document query, using rank_features and distance_features. The new Geo UI uses map layers.

One of the most interesting new Beats to watch for is Functionbeat, which is a serverless data shipper that can subscribe to AWS SQS event topics and CloudWatch Logs, provisions the AWS Lambda function to ship data to Elasticsearch or Elastic Cloud Enterprise.

Elastic lightweight data shippers, Beats such as Filebeat for log files, Metricbeat for metrics, Packetbeat for network data, Winlogbeat for Windows event logs, Auditbeat for audit data, Heartbeat for uptime monitoring, and the latest Functionbeat for serverless shipper can be complemented with Norconex open-source products such as Norconex HTTP Collector or Norconex Filesystem Collector to crawl meta-data from the web or filesystem, then used with the open-source Norconex Elasticsearch Committer to push data to the Elasticsearch index, directly to Elastic Cloud Enterprise or the on-prem Elasticsearch Stack. Norconex can help with collecting meta-data from enterprise web architecture or enterprise filesystems for quick searching and to get relevant results.

Packed into the morning session, Jason Rhodes, Senior Software Engineer at Elastic, presented on unified observability, combining logs, metrics, and traces.

The afternoon session, Search for All with Elastic Enterprise Search and a Site Search demo and feature walkthrough, was presented by Diane Tetrault, Director of Product Marketing at Elastic. The latest UI gives the user the ability to configure content sources they search for and connect to their own data sources. Elastic Common Schema, introduced as an open-source specification, defines a common set of document fields for data ingested into Elasticsearch (https://www.elastic.co/blog/introducing-the-elastic-common-schema).

The Security with Elastic Stack session was presented by Neil Desai, Security Specialist at Elastic. He discussed the latest security capabilities to enable analysis automation to defend from cyber threats.

The Kibana and geo update features in Canvas and Elastic Maps were presented by Raya Fratkina, Kibana Team Lead at Elastic. Learning about ways to use these functionalities makes data more actionable.

I also learned tips at Elastic Architecture at Scale, a presentation by Artem Pogossian, Solutions Architect at Elastic. He discussed scaling from local laptops to multi-clusters and cross-clusters using case deployments.

A useful new feature in machine learning and analytics was introduced by Rich Collier, Solutions Architect and ML Specialist at Elastic. He demonstrated a use case using data frames, also called transforms, a feature that allows transformation of an existing index to a secondary, summarized index. Rich showed in a demo a possible use case from a digital retailer, using time series modeling to look for anomalies and forecasting in the shopper’s purchases, integrating Canvas UI designed in Kibana to build real-time data models. It was amazing to see the ability in demo to detect possible fraudulent purchases without having to be a data science expert.

Finally, after all these informational sessions, thanks to the Elastic event organizers for adding a closing happy hour, where I grabbed a drink with fellow attendees and Elastic folks. This was a great way to close a very extensive learning session. I look forward to being at the next year’s Elastic{ON} tour.

Event pass — Elastic{ON} Tour 2019 in Toronto event pass.

Elastic Team — On the right, Osman Ishaq at Elastic at the Ask Me Anything Booth

Raya Fratikina, Team Lead, Kibana at Elastic

Happy hour closing — Closing happy hour, drink with Elastic folks and other attendees.

Amazon Web Services (AWS) and the Canadian Public Sector organized another excellent Public Sector Summit on May 15, 2019. AWS hosted the first such summit in Ottawa last year, but this year’s event attracted a much larger crowd. Thousands of attendees filled Shaw Centre’s entire third floor.

In the keynote sessions, it was great to hear Alex Benay (deputy minister at the Treasury Board of Canada) talk about the government’s modern digital initiative. He discussed the approach, successes, and challenges of the government’s Cloud migration journey. Another excellent speaker was Mohamed Frendi (director of IT, innovation, science, and economic development for the government of Canada). He covered Canada’s API Store and how it uses the Cloud to make government data more accessible.

The afternoon session was led by Darin Briskman, an AWS developer evangelist. He talked about Amazon’s self-service analytics tool, called AWS Lake Formation, which combines data from multiple sources to resolve data-driven challenges in a timely manner. Machine learning and AI help in making informed decisions and solving problems. This service is a great fit for Norconex’s open-source crawler products HTTP Collector and Filesystem Collector, which fetch data from unstructured data sources to make it easy to consume. Collected content and metadata are natively stored in various existing repositories (or formats), including AWS-specific ones like Amazon Elasticsearch Service, Amazon Open Distro Elasticsearch, and Amazon CloudSearch, as well as many others, such as relational databases, Apache Solr, Google Cloud Search, Neo4J, Microsoft Azure Search, Lucidworks, IDOL, and more.

The diagrams below provide further explanation. The one showing the crawling spider is particularly exciting, because Norconex crawlers have much potential to help in this area. See available Norconex Committers.

AWS Public Sector Summit Event Pass

Selfies with Darin Briskman, Developer Evangelist, AWS and Stevan Beara, Solutions Architect Manager, AWS.