Blog search 🔎

Problem statement

I'd like to provide a quick way to retrieve information from this blog.

In terms of constraints, this blog is statically generated and hosted on Github, which disallows arbitrary plugins, limiting many solutions to client-side and/or external vendors. I'd also prefer to keep things free.

Solutions

Tags

Jekyll supports tags, and the Forestry CMS I use enables me to manage tags alongside content, so I can start by including tags in my index of notes.

Ideally, I could provide inline keyword search.

Search providers understandably require UI control.

Lunr provides a convenient JS library to perform keyword extraction and lookup, and supports pre-building the search index to improve client performance. However, the index for my content was 500kb and the search syntax, although powerful, was unintuitive for my simple needs.

Google's published the most common English words. I could strip these from my content and then include the remainder in my index, eg:

{% unless site.data.stop_words contains word %}
  {{word}}
{% endunless %}

This still yields more words than wieldy for displaying in an index. I'm also limited to Liquid syntax for index generation, which complicates things like exluding code snippets.

So far, the best solution has been constructing a regex from an input string, applying it to the titles and tags of my index and then hiding entries that don't match.

I can take advantage of Google's search indexing by defining a Jekyll sitemap. I can get closer to inline filtering by using Chrome's omnibox. Here's the blog's opensearch.xml.

Feedback

Thoughts? Suggestions? Hit me up @erikeldridge

License

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 International License, and code samples are licensed under the MIT license.