Pure javascript search engine in ItemsJS and Vue

Introduction

In the beginning of 2017 I started experimenting with pure javascript search engine. Before I had lot of small prototypes with no more than 1000 records. I was using then Elasticsearch. Elasticsearch is great as far as you working with over 5K records and you need great search and facets. Otherwise it is quite expensive. I was paying like 10$ per server for a small search websites. It was quite much and also managing elasticsearch was an additional cost (time).



That's why I created and open sourced ItemsJS:

  • search engine in pure javascript
  • supports full text search, facets and sorting
  • it's cheap because it doesn't require server.
  • it can work easily on the frontend side and backend side
  • very fast for small dataset
  • easy to start with

Demo in ItemsJS + VueJS

In the last time I was wondering how to make a good demo of ItemsJS. I decided to write a blog post with example in JSFiddle and VueJS. JSFiddle is good for a demo and it lets programmers experiment with the code easily. In terms of VueJS - I had zero knowledge about it till today. Eventually it found out as great choice because it a golden middle between Angular and React. I needed only a few hours to write a frontend in that framework. It's super intuitive and it just work.

More history and context

I created first prototype in the beginning of 2017. Initially it was intended to work only for Node.js. As it was fast for small dataset and for one user it was slow for many simultaneous users.

Node.js is quite slow for the computational thing and also natively doesn't support multi-threading which you could use for splitting data processing into different workers. When I was running benchmarks Node.js was easily hitting the CPU to the limits while on the other hand C++, Java, Rust or Golang could not even feel it.

I have discovered then that I could run the same software on the frontend side. The computation is then on the client side so the search scalability is actually unlimited. Application can be run even for the million of users at the same time and only one bottleneck is a network. I was very enthusiastic about that because it meant hosting application can be free as it requires only html and javascript.

I've used then browserify which converted for me the Node.js code into browser.

It was only the one line of code in CLI:

browserify index.js -v -s itemsjs -o dist/itemsjs.js

In terms of creating programming interface for ItemsJS I didn't have any problem with that because I've copied that from https://github.com/itemsapi/itemsapi and which was checked in many battles. You can look into https://github.com/itemsapi/itemsjs interface documentation and decide if it is easy or not.

When I was creating the first lines of code I wanted to use as much external tools as possible i.e. https://lunrjs.com/ or https://lodash.com/docs to speed up development and see if the project makes sense or not even with cost of having bigger size of codebase. There is saying that "Premature optimization is the root of all evil" and I totally agree with that. I also believe it's better to start something which is bad and imperfect but solving problem and make continuous improvement than creating something perfect but never finishing it. That's how I started.

One of the best decision here was using testing environment (in this case Mocha). As the codebase was growing and once I was adding new features I felt much more confident that system is still working while having tests. It's usually more fun for me working with the system which has at least crucial cases covered by tests. Personally I am not a fan to be very restrictive and cover all possible cases though. In bigger environment maybe it makes sense.

I've used this software for a few commercial products. Especially for prototypes when I wanted to create something very fast and validate customer idea and once idea is validated and speed is a bottleneck then migrate to Elasticsearch. I had once case of application with over 1000 of items and lot of facets which is computing heavy. Application was becoming slowier and slowier. The interesting but also very simple solution was implementing timings in json results for searching, facets and sorting. Like that:

{
  "facets": 152, 
  "search": 0, 
  "sorting": 0
}

It allowed me to see the time results from the UI perspective (console.log in google console). Thanks to that I could do much more manual benchmarks and in the end as far as I remember I've optmized search results time up to 50%. It's a good feeling when you have search time for 200 ms and after optimization it is 100 ms.

In terms of algorithms I've discovered inverted index for the full text searching. It is actually relatively simple to implement as it is a map of words which points to specific item / document in the array. You can find more info here: https://en.wikipedia.org/wiki/Full-text_search. As as said before I am using https://lunrjs.com/ which is the implementation of that algorithm.

Bigger problem for me was creating facets (you know the filters on the left side where you can narrow the results by tags, actors, genres, price range, etc). I've made a lot of research on the internet and also scanned some books like https://www.amazon.com/Search-Engines-Information-Retrieval-Practice/dp/0136072240 to find some use cases, good patterns but I couldn't find algorithms for that.

I've made my own intuitive implementation and there is a lot boolean operations and also operations on sets and also lot of edge cases. Thankfully there is something like Mocha for tests which helped me covered it.

The faceted search is the biggest bottleneck here for me in terms of the speed so it would be great to implement faster solution if there is any. If I am not wrong the current complexity of my faceted search algorithm is O(n * m ^ 2) (n - items count, m - facets count). Maybe it would be possible to make some index for facets on the initialization stage (the same way as in inverted index) so then faceting search would be much faster or maybe some caching would do the jobs.

In the future once ItemsJS becomes more useful by me and by another developers and interface more stable it might be a good idea to write a clone of it in a Rust or Golang language which might be at least x10 faster than in JavaScript and who knows maybe even x2 or x3 faster than Elasticsearch. My reasoning is from the Rust vs Java benchmark https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=rust&lang2=java (Java is language of Elasticsearch) and also this software would not need most of Elasticsearch functionalities.

Similar solutions

  • algolia.com - it is blazingly fast and works either for small and bigger dataset. You can find some demos here https://community.algolia.com/instantsearch.js/v1/examples/. One drawback is a price but there is also a free edition.
  • http://searchkit.co/ - frontend stack in React for Elasticsearch. I don't feel comfortable with React so it's hard to make a comment for me. I had a problem with starting it but they have lot of followers on github.
  • https://github.com/itemsapi/elasticitems - it's a higher level client for Elasticsearch which I've written in Node.js. It's focused on simplicity (creating facets and search with json configuration) and has quite similar interface to ItemsJS.
  • another ? If you know some another tools which generates search + facets easily please let me know so I'll update

Ending

Thanks for reading this article. If you find ItemsJS valuable or you see potential please share on Facebook or Twitter. You can also give a star on github page https://github.com/itemsapi/itemsjs