Saturday, August 23, 2014

Using Hystrix with Dropwizard

I've previously blogged about Hystrix/Tenacity and Breakerbox. The full code and outline of the example is located here.

Subsequently I've been using Dropwizard and Hystrix without Tenacity/Breakerbox and found it far simpler. I don't see a great deal of value in adding Tenacity and Breakerbox as Hystrix uses Netflix's configuration library Archaius which already comes with dynamic configuration via files, databases and Zookeeper.

So lets see what is involved in integrating Hystrix and Dropwizard.

The example is the same from this article. Briefly, it is a single service that calls out to three other services:
  • A user service
  • A pin check service
  • A device service

To allow the application to reliably handle dependency failures we are going to call out to each of the three services using Hystrix commands. Here is an example of calling out using the Apache Http Client to a pin check service:

To execute this command you simply instantiate an instance and call execute(), Hystrix handles creating a work queue and thread pool. Each command that is executed with the same group will use the same work queue and thread pool. You tell Hystrix the group by passing it to super() when extending a Hystrix command. To configure Hystrix in a Dropwizard like way we can add a map to our Dropwizard YAML:

This will translate to a Map in your Dropwizard configuration class:


The advantage of using a simple map rather than a class with the property names matching Hystrix property names is this allows you to be completely decoupled from Hystrix and its property naming conventions. It also allows users to copy property names directly from Hystrix documentation into the YAML.

To enable Hystrix to pick these properties up it requires a single line in your Dropwizard application class. This simplicity is due to the fact that Hystrix uses Archaius for property management.


Now you can add as any of Hystrix's many properties to your YAML. Then later extend the Configuration you install to include a dynamic configuration source such as ZooKeeper.


I hope this shows just how simple it is to use Hystrix with Dropwizard without bothering with Tenacity. A full working example is on github

Thursday, August 21, 2014

Stubbed Cassandra at Skills Matter

Yesterday I gave a talk on how to test Cassandra applications using Stubbed Cassandra at the Skills Matter in London for the Cassandra London meetup group.

The talk was well attended with some where between 50 to 100 people attending.

The slides are on Slide share:


And the talk is on the skills matter website.

Thanks to Cassandra London and Skills Matter for having me!

Thursday, August 7, 2014

RabbitMQ and Highly Available Queues

RabbiqMQ is a AMQP broker with an interesting set of HA abilities. Do a little research and your head will start spinning working out the differences between making messages persistent, or queues durable, or was it durable messages and HA queues with transactions? Hopefully the following is all the information you need in one place.

Before evaluating them you need to define your requirements.

  • Do you want queues to survive broker failures? 
  • Do you want unconsumed messages to survive a broker failure?
  • What matters more, publisher speed, or the above? Or do you want a nice compromise?

RabbitMQ allows you to:

  • Make a cluster of Rabbits where clients can communicate with any node in the cluster
  • Make a queue durable, meaning the queue definition itself will survive broker failure
  • Make a message persistent, meaning that it will get stored to disk, which you do by setting a message's delivery_mode
  • Make a queue HA, meaning its contents will be replicated across brokers, either a specified list, all of them or a number of them 
  • Even an HA queue has a single master that handles all operations on that queue even if the client is connected to a different node in the cluster, the master sends information to the replicas, these are called slaves
Okay so you have a durable queue that is HA and you're using persistent messages (you really want it all!). How do you work with the queue correctly?

Producing to an HA queue


You have three options for publishing to a HA queue:
  • Accept the defaults, the publish will return with no guarantees in the result of broker failure
  • Publisher confirms
  • Transactions
The defaults: You went to all that effort of making a durable HA queue and send a persistent message and then you just fire and forget? Sounds crazy, but its not. You might have done the above to make sure you don't lose a lot of messages, but you don't want the performance impact of waiting for any form of acknowledgment. You're essentially accepting a few failures when you lose a rabbit that is the master for any of your queues.

Transactions: To use RabbitMQ transactions you do a txSelect on your channel. Then when you publish a message you call txCommit which won't return until your message has been accepted by all of the master and all of the queues slaves. If you message is persistent then that means it is on the disk of them all, you're safe! What's not to like? The speed! Every persistent message that is published in a transaction results in an fsync to disk. You need a compromise you say? 

Publisher confirms: So you don't want to lose your messages and you want to speed things up. Then you can enable publish confirms on your channel. RabbitMQ will then send you a confirmation when the message has made it to disk on all the rabbits but it won't do it right away, it will flush things to disk in batches. You can either block periodically or set up a listener to get notified. Then you can put logic in your publisher to do retries etc. You might even write logic to limit the number of published messages that haven't been confirmed. But wait, isn't queueing meant to be easy?

Consuming from a HA queue


Okay, so you have your message on the queue - how do you consume it? This is simpler:
  • Auto-ack: As soon as a message is delivered RabbitMQ discards it
  • Ack: Your consumer has to manually ack each message
If your consumer crashes and disconnects from Rabbit then the message will be re-queued. However if you have a bug and you just don't ack it, then Rabbit will keep a hold of it until you disconnect, then it will be re-queued. I bet that leads to some interesting bugs!

So what could go wrong?


This sounds peachy, you don't care about performance so you have a durable HA queue with persistent messages and are using transactions for producing and acks when consuming, you guaranteed exactly once delivery right? Well, no. Imagine your consumer crashes having consumed the message but just before sending the ack? Rabbit will re-send the message to another consumer.

HA queueing is hard!

Conclusion 


There is no magic bullet, you really need to understand the software you use for HA queueing. It is complicated and I didn't even cover topics like network partitions. Rabbit is a great piece of software and its automatic failover is really great but every notch you add on (transactions etc) will degrade your performance significantly.



Monday, August 4, 2014

Getting started with Hystrix and Tenacity to build fault tolerant applications

Applications are becoming increasingly distributed. Micro service architecture is the new rage. This means that each application you develop has more and more "integration points".

Any time you make a call to another service or database, or use any third party library that is a black box to you, it can be thought of as an integration point.

Netflix's architecture gives a great example of how to deal with integration points. They have a popular open-source library called Hystrix which allows you to isolate integration points by executing all calls in its own worker queue and thread pool.

Yammer have integrated Hystrix with Dropwizard, enabling enhancement of applications to publish metrics and accept configuration updates.

Here is an example application that calls out to three HTTP services and collects the results together into a single response.
Rather then calling into a HTTP library on the thread provided by Jetty this application uses Yammer's Hystrix wrapper, Tenacity.

Let's look at one of the integration points:


Here we extend the TenacityCommand class and call the dependency in the run() method. Typically all this code would be in another class with the TenacityCommand just being a wrapper, but this is a self-contained example. Let's explain what it is doing:
  • Making an HTTP call using the Apache HTTP client
  • If it fails throw a Runtime exception
By instantiating this TenacityCommand and calling execute(), your code is automagically executed on its very own thread pool, and requests are queued on its very own work queue. What benefits do you get?
  • You get a guaranteed timeout, so no more relying on library's read timeout that never seems to work in production
  • You get a circuit breaker that opens if a configured % of calls fail, meaning you can fail fast and throttle calls to failing dependencies
  • You get endpoints that show the configuration and whether a circuit breaker is open

If the call to run() fails, times out or the circuit breaker is open Tenacity will call the optional getFallback() method, so you can provide a fallback strategy rather than failing completely. 

Another hidden benefit is how easy it is to move to a more asynchronous style of programming. Let's look at the resource class that pulls together the three dependencies:

Let's ignore the fact we are calling out to other HTTP services from a resource layer. The above code shows how to use Tenacity synchronously. Apart from the advantages you gain regarding failures, all the calls are still happening one by one as we call execute() which blocks so we don't call the second dependency until the first one has finished.

However, this doesn't have to be the case. Now you've snuck Tenacity into your code base you can change the code to something like this:

And without your colleagues realising you've made all of your calls to your dependencies execute asynchronously and (possibly) at the same time, then you block to bring them all together at the end.

We've barely scratched the surface of Hystrix and Tenacity but hopefully you can already see the benefits. All the code for this example a long with instructions on how to use wiremock to mock the dependencies is here.