As I mentioned in a recent blog post about image search, we’re avid users of Elasticsearch for search. We also recently ported another vital part of our system to Elasticsearch: analytics. This post is a technical deep dive into how our analytics system works, and specifically how and why we used Elasticsearch to build it. Background DigitalGov Search is essentially one giant software-as-a-service (SaaS), with 1,500 government websites as its customers.
In the first part of A Picture Is Worth a Thousand Tokens, I explained why we built a social media-driven image search engine, and specifically how we used Elasticsearch to build its first iteration. In this week’s post, I’ll take a deep dive into how we worked to improve relevancy, recall, and the searcher’s experience as a whole. Redefine Recency To solve the scoring problem on older photos for archival photostreams, we decided that after some amount of time, say six weeks, we no longer wanted to keep decaying the relevancy on photos.
Increasingly, we’ve noticed that our agency customers are publishing their highest quality images on social media and within database-driven multimedia galleries on their websites. These sources are curated, contain metadata, and have both thumbnails and full-size images. That’s a big improvement in quality over the images embedded within HTML pages on agencies’ websites. After some investigating, we decided we could leverage their Flickr and Instagram photos to build an image search engine that better met their needs.