Setting up Solr healthcheck alert using Zookeeper watches

In this post we will see how to get instant health alerts if any replica of any collection becomes unhealthy in a cluster.

For this purpose, we will use zookeeper watches. But before we go there, lets see how solr maintains state information for a collection.

State management in solr

All state related information of solr collections is stored in zookeeper and shared by all nodes in the cluster. For each collection, solr maintains a state.json file.

You can see zookeeper data created by solr from command line:

[zk: localhost:2181(CONNECTED) 5] ls /
[configs, zookeeper, overseer, aliases.json, live_nodes, collections, overseer_elect, clusterstate.json]

[zk: localhost:2181(CONNECTED) 6] ls /collections
[]

Here's collections is a zNode and for each newly created collection, a new child will be created under this node. Lets try creating a collection:

http://localhost:8983/solr/admin/collections?action=create&name=collection1&collection.configName=collection1&numShards=1

Here's what zk data looks like after successful collection creation:

[zk: localhost:2181(CONNECTED) 7] ls /collections
[collection1]

[zk: localhost:2181(CONNECTED) 9] ls /collections/collection1 
[leaders, state.json, leader_elect]

You will see that it created a a child zkNode named collection1 and under that it created a state.json file. Lets see content of that file:

[zk: localhost:2181(CONNECTED) 11] get /collections/collection1/state.json
{"collection1":{
    "replicationFactor":"1",
    "router":{"name":"compositeId"},
    "maxShardsPerNode":"1",
    "autoAddReplicas":"false",
    "shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{"core_node1":{
            "core":"collection1_shard1_replica1",
            "base_url":"http://127.0.1.1:8983/solr",
            "node_name":"127.0.1.1:8983_solr",
            "state":"active",
            "leader":"true"}}}}}}

Notice the state key on line 14. It maintains state information for each replica. So by checking this value we can know the health of each replica. Try create another collection with multiple shards and replicas and observer the content of state.json

Solr also exposes this information through CLUSTERSTATUS API mentioned below, but there are some rare instances where this API fails to gather and return data from zk. So we can't rely on that:

http://localhost:8983/solr/admin/collections?action=clusterstatus

Zookeeper chroot used by solr is also exposed in solr admin UI. Navigate to cloud -> tree to access it and then expand the collections zNode. Here's what it will look like:

Solr Tree View

Whenever a replica's state changes, it will be reflected here. What we want is that whenever any replica's state is non-active, we want to send an alert. Basically we need to watch each state.json for any change and then trigger an alert, if its not active.

To watch for changes, we will use zookeeper watches.

Zookeeper watches and curator

Zk watches allow clients to get notified when a znode changes in some way. Watches are set by operations, and are triggered by ZooKeeper when anything gets changed. For example, a watch can be placed on a znode which will be triggered when the znode data changes or the znode itself gets deleted.

Watch management is a low level task, so we will use a library called curator(https://curator.apache.org/) which abstract out lot of task like connection management, etc.

We will use PathChildrenCache and TreeCache recipies provided by curator for setting up watches:

PathChildrenCacheListener

It will be used to keep track of collection being created and deleted. Whenever a child is added, updated or removed, the Path Cache will change its state to contain the current set of children, the children's data and the children's state.

To watch for list of collection, we can define a listener on zk's /collections zNode like below:

private val collectionCache = new PathChildrenCache(curatorClient, collectionsZNode, true)

collectionCache.getListenable.addListener(new PathChildrenCacheListener {
  override def childEvent(client: CuratorFramework, event: PathChildrenCacheEvent): Unit = {
    Logger.info(s"Change in path ${event.getData.getPath}")
  }
})

collectionCache.start(StartMode.BUILD_INITIAL_CACHE)

Line 9 will initialize the collectionCache with list of existing colletions and PathChildrenCacheListener will make sure that its always in sync with list of collection in zookeeper

TreeCache

Next we want to set watch on state.json of each collection. Instead of setting a watch individually on each collection's state.json zNode, we will use TreeCache recipie which will watch all the files inside the /collections zNode. TreeCache attempts to keep all data from all children of a ZK path locally cached. This class will watch the ZK path, respond to update/create/delete events, pull down the data, etc. We can register a listener that will get notified when changes occur.

Since we are only intreseted in zNodes named state.json, we can ignore changes to other children. Here's how to set the watch:

private val clusterStateCache = new TreeCache(curatorClient, collectionsZNode)

clusterStateCache.getListenable.addListener(new TreeCacheListener {
  override def childEvent(client: CuratorFramework, event: TreeCacheEvent) = {
    try {
      if(event.getData != null) {
        val path = event.getData.getPath
        if (path.endsWith(".*state.json")) {
          Logger.info(s"Change in path $path")
          val data = new String(event.getData.getData)
          processState(data)
        }
      }
    } catch {
      case ex: Exception =>
        ex.printStackTrace()
        Logger.error(s"Exception in watcher child event. " + ex.getMessage)
    }
  }
})

clusterStateCache.start()

Here data in line 30 is same json which we saw at the starting. processState will parse this json and filter out unhealthy replicas, which will be send over through an email as an alert.

Note that a non-active state not necessarily means that a replica is unhealthy and needs manual intervention, so there might be few false alerts too. For instance, when a replica is under recovery due to load on a certain node, its state will show as recovery but it will auto recover after few minutes. To reduce noise from such alerts, instead of sending email immediately, an alert for a replica can be put on hold and rechecked after few minutes whether it auto-recovered on not

You can find full code for this utility at https://bitbucket.org/saumitra/solr-health-alerts