Current status of ItemsAPI and related tools

Introduction

I started working on ItemsAPI in 2015 when I was living in Germany (Berlin). I always liked to organize information for specific topics. Imagine various atlas books like atlas of birds, atlas of herbs, atlas of excercises etc in a digital way with nice UX.

That's why I created ItemsAPI. I wanted to have a tool with simple interface for creating, updating, deleting and listing items. The generic API with search features repeatable accross booking systems, meta search engines, eCommerce systems, etc.

After two years I've created quite much tools related to my original goal - organizing information in a easy way. Many of them were failure and waste of time but from today perspective they were also invaluable lessons about learning new technologies or various processes like lean or kaizen.

All in all after so many mistakes the next tools I do are better and simplier. This knowledge is also useful for working with customers.

Below I will give you information about current status of my open source projects.

ItemsAPI

https://github.com/itemsapi/itemsapi - it's an API over Elasticsearch written in Express.js. It's focused on simplicity for creating full text search, facets, sorting and items recommendation only based on your configuration in JSON. I've used this technology for a few websites i.e. http://devteams.co/ and also I've used it for the first iterations of https://shoprank.co/ which now needs more sophisticated solutions.

The biggest problem here for me is complicated codebase. It's not intuitive and not easy to work with. I am also not happy of many different database dependencies like Redis or MongoDB besides Elasticsearch. From today perspective I also find it as a mistake to start API rather than higher level client for Elasticsearch in Node.js. The second solution would be much simplier and it could also solve the problem. I started API because I was a little bit hyped of the frontend technologies like Angular or React and Dockerization trend, like you can put all API technologies into docker and it works.

The plan for the ItemsAPI - I will keep it now as it is and if there will more needs for such a API I will let it evolve. Maybe also creating next API version around ElasticItems which is much simplier.

ElasticItems

https://github.com/itemsapi/elasticitems - it's higher level client for Elasticsearch written in Node.js. It's actually the same as ItemsAPI above but it's just a client in Node.js. It's simplier for me because it doesn't need many abstractions layer like ES client -> API -> API client, it is just Elasticsearch client. It doesn't also require running additional server like ItemsAPI does. It is just connecting with Elasticsearch directly. One drawback it works only with Node.js.

I am quite happy of this solution. It has simple interface and it is easy to start with. I am using it with http://shoprank.co/ and also for a few commercial enterprise projects. Mostly as internal research and business intelligence tools.

ItemsJS

https://github.com/itemsapi/itemsjs - this is full text, faceted search engine in JavaScript. It has somehow similar interface to ItemsAPI and also similar JSON responses. Before creating it I have read / scanned some theory about search engine. https://www.amazon.com/Search-Engines-Information-Retrieval-Practice/dp/0136072240. It was interesting to see how algorithms works internally. I had lot of algorithms classes at my University but those kinds of algorithms (i.e. inverted search) were quite new for me.

The reason for creating it was to make something cheaper than Elasticsearch. Also something which works perfect for small requirements (up to 1000 items) and cost nothing as it is backend-free solution. The next reasons was a curiosity if it is possible to create fast solution in pure JavaScript and if I am able to do it at all. It was a big challenge for me. I've also learnt lot. When I was testing it from Node.js I've naturally understood the needs for sharding, multi-threading. I am also more aware now about Node.js limits.

The next time if I decide to make something similar as an experiment but for bigger dataset I will try Rust (https://www.rust-lang.org/rn-US/) which is relatively easy to work with and has performance similar to C++. It's sometimes even a few times faster than Java which is the language of Elasticsearch. For the performance stat about Rust vs Java you can look here https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=rust&lang2=java

In terms of the next features I will consider making items similarity which is based on collaborative filtering (https://en.wikipedia.org/wiki/Collaborative_filtering) algorithm, also maybe some caching and pre-computed optimization to make even faster. If you want to see a real demo of ItemsJS in action you can read this article Search engine with facets in JavaScript

Elasticbulk

https://github.com/itemsapi/elasticbulk - the Node.js convenient tool for adding data in bulk to Elasticsearch directly using data in JSON or by streams from PostgreSQL, MySQL or MongoDB. The interesting thing I got a few feedback from people using it and I didn't make any marketing about it. Probably the power of github network effect.

It was created as a needs to index a few millions of records to Elasticsearch in an easy and fast way. Indexing around million of records to Elasticsearch takes right now about 5-10 minutes (depends on network speed and hardware). The plan now is to make it faster by being more asynchronous while still using streams and also keep supporting all Elasticsearch versions 1.x, 2.x and 5.x. The mocha test framework is doing a good job here.

Other

There are also another tools which I didn't mention in the article like https://github.com/itemsapi/starter or https://github.com/itemsapi/dashboard which I'll probably not develop any more. I've created them experimentally and I'll invest time in the most useful and promising tools. You can find some part of the base code useful though.

Future

In the beginning I was focused mostly on ItemsAPI but now and in the future I will focus on different tools related to search and organizing information. Developing open source tools, products and maybe also providing use cases and good patterns as a form of blog posts.

Pure javascript search engine in ItemsJS and Vue

Introduction

In the beginning of 2017 I started experimenting with pure javascript search engine. Before I had lot of small prototypes with no more than 1000 records. I was using then Elasticsearch. Elasticsearch is great as far as you working with over 5K records and you need great search and facets. Otherwise it is quite expensive. I was paying like 10$ per server for a small search websites. It was quite much and also managing elasticsearch was an additional cost (time).



That's why I created and open sourced ItemsJS:

  • search engine in pure javascript
  • supports full text search, facets and sorting
  • it's cheap because it doesn't require server.
  • it can work easily on the frontend side and backend side
  • very fast for small dataset
  • easy to start with

Demo in ItemsJS + VueJS

In the last time I was wondering how to make a good demo of ItemsJS. I decided to write a blog post with example in JSFiddle and VueJS. JSFiddle is good for a demo and it lets programmers experiment with the code easily. In terms of VueJS - I had zero knowledge about it till today. Eventually it found out as great choice because it a golden middle between Angular and React. I needed only a few hours to write a frontend in that framework. It's super intuitive and it just work.

More history and context

I created first prototype in the beginning of 2017. Initially it was intended to work only for Node.js. As it was fast for small dataset and for one user it was slow for many simultaneous users.

Node.js is quite slow for the computational thing and also natively doesn't support multi-threading which you could use for splitting data processing into different workers. When I was running benchmarks Node.js was easily hitting the CPU to the limits while on the other hand C++, Java, Rust or Golang could not even feel it.

I have discovered then that I could run the same software on the frontend side. The computation is then on the client side so the search scalability is actually unlimited. Application can be run even for the million of users at the same time and only one bottleneck is a network. I was very enthusiastic about that because it meant hosting application can be free as it requires only html and javascript.

I've used then browserify which converted for me the Node.js code into browser.

It was only the one line of code in CLI:

browserify index.js -v -s itemsjs -o dist/itemsjs.js

In terms of creating programming interface for ItemsJS I didn't have any problem with that because I've copied that from https://github.com/itemsapi/itemsapi and which was checked in many battles. You can look into https://github.com/itemsapi/itemsjs interface documentation and decide if it is easy or not.

When I was creating the first lines of code I wanted to use as much external tools as possible i.e. https://lunrjs.com/ or https://lodash.com/docs to speed up development and see if the project makes sense or not even with cost of having bigger size of codebase. There is saying that "Premature optimization is the root of all evil" and I totally agree with that. I also believe it's better to start something which is bad and imperfect but solving problem and make continuous improvement than creating something perfect but never finishing it. That's how I started.

One of the best decision here was using testing environment (in this case Mocha). As the codebase was growing and once I was adding new features I felt much more confident that system is still working while having tests. It's usually more fun for me working with the system which has at least crucial cases covered by tests. Personally I am not a fan to be very restrictive and cover all possible cases though. In bigger environment maybe it makes sense.

I've used this software for a few commercial products. Especially for prototypes when I wanted to create something very fast and validate customer idea and once idea is validated and speed is a bottleneck then migrate to Elasticsearch. I had once case of application with over 1000 of items and lot of facets which is computing heavy. Application was becoming slowier and slowier. The interesting but also very simple solution was implementing timings in json results for searching, facets and sorting. Like that:

{
  "facets": 152, 
  "search": 0, 
  "sorting": 0
}

It allowed me to see the time results from the UI perspective (console.log in google console). Thanks to that I could do much more manual benchmarks and in the end as far as I remember I've optmized search results time up to 50%. It's a good feeling when you have search time for 200 ms and after optimization it is 100 ms.

In terms of algorithms I've discovered inverted index for the full text searching. It is actually relatively simple to implement as it is a map of words which points to specific item / document in the array. You can find more info here: https://en.wikipedia.org/wiki/Full-text_search. As as said before I am using https://lunrjs.com/ which is the implementation of that algorithm.

Bigger problem for me was creating facets (you know the filters on the left side where you can narrow the results by tags, actors, genres, price range, etc). I've made a lot of research on the internet and also scanned some books like https://www.amazon.com/Search-Engines-Information-Retrieval-Practice/dp/0136072240 to find some use cases, good patterns but I couldn't find algorithms for that.

I've made my own intuitive implementation and there is a lot boolean operations and also operations on sets and also lot of edge cases. Thankfully there is something like Mocha for tests which helped me covered it.

The faceted search is the biggest bottleneck here for me in terms of the speed so it would be great to implement faster solution if there is any. If I am not wrong the current complexity of my faceted search algorithm is O(n * m ^ 2) (n - items count, m - facets count). Maybe it would be possible to make some index for facets on the initialization stage (the same way as in inverted index) so then faceting search would be much faster or maybe some caching would do the jobs.

In the future once ItemsJS becomes more useful by me and by another developers and interface more stable it might be a good idea to write a clone of it in a Rust or Golang language which might be at least x10 faster than in JavaScript and who knows maybe even x2 or x3 faster than Elasticsearch. My reasoning is from the Rust vs Java benchmark https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=rust&lang2=java (Java is language of Elasticsearch) and also this software would not need most of Elasticsearch functionalities.

Similar solutions

  • algolia.com - it is blazingly fast and works either for small and bigger dataset. You can find some demos here https://community.algolia.com/instantsearch.js/v1/examples/. One drawback is a price but there is also a free edition.
  • http://searchkit.co/ - frontend stack in React for Elasticsearch. I don't feel comfortable with React so it's hard to make a comment for me. I had a problem with starting it but they have lot of followers on github.
  • https://github.com/itemsapi/elasticitems - it's a higher level client for Elasticsearch which I've written in Node.js. It's focused on simplicity (creating facets and search with json configuration) and has quite similar interface to ItemsJS.
  • another ? If you know some another tools which generates search + facets easily please let me know so I'll update

Ending

Thanks for reading this article. If you find ItemsJS valuable or you see potential please share on Facebook or Twitter. You can also give a star on github page https://github.com/itemsapi/itemsjs

Installing and running Elasticsearch in 5 different ways

Elasticsearch is a great and popular open source search engine which can be installed on many various ways.

I've prepared 5 ways how to install and run Elasticsearch with pros and cons.

I hope this can be useful for beginners and even also for more experienced developers.

1. DigitalOcean + Docker

Pros:

  • Extremely easy way to run Elasticsearch.
  • It just requires a few very simple steps to make it working.
  • Resizing (CPU and RAM) and creating snapshots is very convenient
  • Relatively easy to migrate your docker to another host
  • Creating many Elasticsearch instances on one host with docker is effortless

Cons:

  • I don’t see many cons.
  • You need to have knowledge about servers and take care about maintaining and making backups on your own.

Installation:

ssh root@your-new-ip-address
# disable firewall and allow to open 9200 port
ufw disable
docker run -p 9200:9200 elasticsearch:1.7.6
# open http://your-new-ip-address:9200 to test it out

2. AWS Elasticsearch

Choosing domain and version for Elasticsearch in AWS

Configuring Elasticsearch cluster in Amazon Web Services

Pros

  • Scalable. You can choose how many instances you want and their size
  • Secure. You can control access to API by AWS Identity or Access Management (IAM) policies
  • Easy to run

Cons

  • Only 1.5 and 2.3 versions available of Elasticsearch
  • It is quite expensive. The smallest instance of 1 GB ram and 1 vCPU costs $0.018 per Hour so it is 13$ per Month

3. Elasticsearch with Ansible

- name: Elasticsearch with custom configuration
  hosts: localhost
  roles:
    #expand to all available parameters
    - { role: elasticsearch, es_instance_name: "node1", es_data_dirs: "/opt/elasticsearch/data", es_log_dir: "/opt/elasticsearch/logs", es_work_dir: "/opt/elasticsearch/temp", 
    es_config: {
        node.name: "node1", 
        cluster.name: "custom-cluster",
        discovery.zen.ping.unicast.hosts: "localhost:9301",
        http.port: 9201,
        transport.tcp.port: 9301,
        node.data: false,
        node.master: true,
        bootstrap.mlockall: true,
        discovery.zen.ping.multicast.enabled: false } 
    }
  vars:
    es_scripts: false
    es_templates: false
    es_version_lock: false
    es_heap_size: 1g

Pros

  • It’s very very flexible, configurable and fully automated.
  • Ansible has very low learning curve in comparison to Chef or Puppet in terms of provisioninig

Cons

  • Requires knowledge about Ansible and managing linux system
  • Requires making maintenance and backups on your own

Installation

  • Make sure Ansible is installed in your localhost i.e. by ansible --version
  • run ansible-playbook your-own-playbook.yml

More information there: https://github.com/elastic/ansible-elasticsearch

4. QBox Hosted Elasticsearch

Pros

  • It is very scalable and huge. You can have up to 512 GB Ram and 46 vCPU’s
  • 4 regions available (USA, Europe, Australia and Asia)
  • 24h/7 support and help for every customer
  • automatic backups

Cons

  • It is expensive. The instance of 1 GB ram and 1 vCPU costs $0.05/hr ($40.00/mo). This is 2.5x more expensive than AWS (on the smallest instance)

5. Manual installation on Ubuntu 14.04

Installation Elasticsearch 1.7.2 on Ubuntu

#!/bin/sh

sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
sudo apt-get -y install oracle-java8-installer
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.2.deb
sudo dpkg -i elasticsearch-1.7.2.deb
sudo service elasticsearch start

Pros

  • it can be faster than with using docker

Cons

  • Different installation commands on each linux distros.
  • Testing installation
  • Usually it is enough to open it in the browser i.e. http://localhost:9200 or make a request in CLI curl -XGET http://localhost:9200

Typical response:

Thank your for reading the whole article! Feel free to share it to social media if you find it useful