How to implement Circuit Breaker Design Pattern in Spring Boot?

Table of Contents:

Problem:

Fault tolerance is a major requirement for enterprise applications.

If your application gets huge load , how does it behave?

If something wrong happens within your application and you still want to handle it gracefully , what do you do?

Fault tolerance is especially important for microservices since many microservices communicate with each other and any of them can break down.

Let us look into a specific case of fault tolerance.

One of your microservice (let’s say A) calls another microservice (let’s say B).

B keeps failiing.

A keeps hitting B for every request from the end user.

And B keeps failing.

B not only fails to give a success response but its server gets overwhelmed by the requests from A.

Wouldn’t it be nice if you can stop sending requests to B from A if it is not responding.

And then wait for some time and then check if B is able to provide successful response.

And then allow requests from A when B starts to respond.

There is a fault tolerance design pattern which handles exactly this use case.

Solution:

Circuit Breaker pattern is a design pattern which solves the above problem.

Circuit Breaker derives its name from the electrical circuit breaker we have in your homes.

What does it do?

If there is too much current flowing into our electrical devices , it might damage them.

To prevent that there is a device named circuit breaker which blocks too much current from flowing to those devices.

It makes the circuit OPEN from being CLOSED so that current cannot flow through.

Later you can move it back to CLOSED state manually.

The circuit breaker pattern works exactly like that.

If there are too many requests to a system and we know that it is not responding properly we can block the request flow.

A normal request flow from one service to another means the circuit is CLOSED.

If we block the request then the circuit moves from CLOSED to OPEN state and no more requests are allowed to flow.

There is one more state in addition to this design pattern though.

The HALF-OPEN state.

Remember in a electrical circuit breaker we can move the circuit back to CLOSED state from OPEN state manually by pulling a latch.

But we can’t keep doing that manually in a real world enterprise application.

To automate that , what this pattern does is :

It waits for a specific period of time after the circuit is OPEN (no more requests are allowed).

After that period it moves to HALF-OPEN state.

In this state , it allows a specified number of requests .

If the new requests again fail above a specific threshold then it keeps the circuit in OPEN state.

If less requests then a specific threshold fail, then it moves the circuit to CLOSED state again and the flow is resumed.

Here is a diagram explaining the different states:

Types of Circuit Breaker:

There are two types of circuit breaker:

  1. Time based
  2. Count based

Time based Circuit Breaker:

In time based circuit breaker , you sample requests for a specific time.

Let’s say 10 seconds.

This is also called sliding window size.

We see how the target service responds during these 10 seconds for all the requests made during that time.

If the requests fail for more than a specific threshold: let’s say the threshold is 5 and there were 10 requests made in the above duration.

If more than 5 requests fail then we move the circuit to OPEN state and no more requests are forwarded to the target service.

After this we will wait for a specific time (let’s say 20 seconds) and then again try a few requests (let’s say 4 requests) .

This is called HALF OPEN state.

If less than a specific number of requests fail during this state (let’s say 2 is the threshold and 3 out of 4 requests succeed) , then we move back the circuit to CLOSED state.

If it doesn’t then we remain in the OPEN state.

Count Based Circuit Breaker:

In count based circuit breaker , we sample a specific number of requests instead of sampling requests during a specific time.

Let’s say we sample 10 requests.

And let’s say the failure threshold is 5.

So if more than 5 requests fail then we move the circuit to OPEN and no more requests are allowed.

Again we wait for a specific period of time (let’s say 20 seconds)

After this the circuit moves to HALF OPEN state.

During this state we try a few requests (let’s say 4 requests).

We check if the requests fail below a threshold (let’s say threshold is 2).

If so we move back to CLOSED state and requests flow normally.

Else we remain in OPEN state and requests are blocked.

And the cycle continues.

Sampling:

We talked about the sample size in both time based and count based circuit breakers.

How is this sampling taken?

Let’s say you have configured the sampling size (or sliding window size) to be 10 (count based).

The first 10 requests from your source service to target service are first sampled.

Then when the 11th request comes we leave out the first request and the latest 10 requests are analyzed.

And when the 12 the request comes we leave out the first two requests and the latest 10 requests are analyzed and so on.

So at any stage we consider the recent requests.

Similarly for time based circuit breaker ,

If the sample size is 10 seconds , then we consider all the requests made during the first 10 seconds.

And in the next second , we ignore the requests made during the first second and consider all the requests made during the latest 10 seconds.

Minimum Number of Calls:

Also you can wait for a minimum number of calls to be made before you start your sampling.

For example, if you sampling size is 10 you can configure that you will start the sampling only after a minimum number of calls are made (let’s say 5) .

So sampling starts from the 6th request.

Now let’s see how to implement this in Spring Boot.

Implementation:

To implement this in Spring Boot we are going to use a library named resilience4j.

Resilience4j is a library which helps you develop fault tolerant applications through several fault tolerance design patterns like Circuit Breaker , Retry , Bulkhead , Thread Bulkhead , Rate limiter etc.

To implement circuit breaker , we will do the following steps:

  1. Add the required dependencies
  2. Configure circuit breaker properties like:
    • the failure threshold above which the circuit breaker should OPEN ,
    • how many requests should it try before deciding to open the circuit
    • how long should it wait in OPEN state before retrying again ,
    • what type of circuit breaker needs to be implemented (time based or count based). All these can be done in application.properties.
  3. Add annotation(@CircuitBreaker) to the method you want to implement circuit breaker . These methods will be considered by resilience4j.
  4. Provide a fallback method for error scenarios. Fall back methods handle error gracefully.

Code:

Dependencies:

Add the below dependencies:

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>

       <dependency>
           <groupId>io.github.resilience4j</groupId>
           <artifactId>resilience4j-spring-boot3</artifactId>
           <version>2.0.2</version>
       </dependency>

	 <dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-aop</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-actuator</artifactId>
		</dependency>
	
	
	</dependencies>

Use Case 1: Time based Circuit breaker for network failure scenario

Let’s make three REST API calls to test three different scenarios:

  1. Time based Circuit breaker for 404 error:

Let’s try to hit a service on a server(localhost) which does not exist.

We will get 404 error.

Instead of hitting it continuously we will implement time based circuit breaker to make it fault tolerant.

So it will stop hitting the server after a specific threshold and then retry again after some time.

Here is the API call:

    @GetMapping("/test1")
    @CircuitBreaker(name = "test1service", fallbackMethod = "fallback")
    public String test() {

        return this.restTemplate.getForObject("http://localhost:8080/testing", String.class);

    }

Notice the annotation @CircuitBreaker.

Whatever method you want to implement circuit breaker , you annotate it with @CircuitBreaker.

You also provide a fallback method for exception scenarios:

    private String fallback(Throwable e) {

        System.out.println("Exception happened : " + e.getMessage());
        return "Handled the exception through fallback method";
    }

Here is the configuration in application.yml:

resilience4j.circuitbreaker:
  instances:
    test1service:
      slidingWindowType: TIME_BASED
      slidingWindowSize: 10
      minimumNumberOfCalls: 20
      failureRateThreshold: 50
      waitDurationInOpenState: 20s
      permittedNumberOfCallsInHalfOpenState: 4

Let’s look into these properties.

This configuration is for the instance “test1service”.

We have configured the same name in @CircuitBreaker annotation for the REST API shown earlier.

slidingWindowType: TIME_BASED

As earlier mentioned there are two types of circuit breaker and this one is time based.

slidingWindowSize: 10

This means we will take sample size for 10 seconds (since this is time based)

minimumNumberOfCalls: 20

So to take the sample size we would need atleast 20 calls made within 10 seconds (slidingWindowSize).

Only then we will do the sampling.

So if within 10 seconds only 5 calls were made we will not do the sampling and hence will not implement circuit breaker.

failureRateThreshold: 50

This value is in percentage.

This means that if more than 50 percent of the calls made within 10 seconds (sliding window size) fail then we will invoke the circuit breaker and block further requests.

waitDurationInOpenState: 20s

Once the circuit becomes OPEN and no further requests are allowed , the circuit will wait for 20 seconds in OPEN state.

It then moves to HALF_OPEN state where it will allow specific number of requests to check if the target system responds.

permittedNumberOfCallsInHalfOpenState: 4

This means after waiting for 20 seconds as shown above in OPEN state , the circuit moves to HALF OPEN state and makes 4 calls to check if the target system responds now.

If it fails again more than 50% (failure threshold) , it goes back to OPEN state.

Else it goes to CLOSED state and requests flow through.

In short,

We will wait till 20 requests are made within 10 seconds and then take the sample size for those 10 seconds.

If more than half of the requests fail then application will stop sending requests to the target service.

This will be so for another 20 seconds.

Then the application will try 4 requests.(HALF OPEN state)

If more than 2 of them fail , it will still remain in OPEN state and will not make requests to the target service for another 20 seconds.

Else , it will make request to the target service and the circuit remains CLOSED.

We will see a demo of this shortly.

Use Case 2: Time based Circuit Breaker for slow responses:

Circuit Breaker can also be applied for slow responses.

If your target service responds too late and if it happens above a specific threshold then you can stop sending requests to it.

Here is an API to test that scenario:

  @GetMapping("/test2")
    @CircuitBreaker(name = "test2service", fallbackMethod = "fallback")
    public String test2() {

        return this.restTemplate.getForObject("http://localhost:8081/testing2", String.class);

    }

Very similar to the previous example except that we are calling service /testing2 which runs on port 8081 and returns a string response:

   @GetMapping("/testing2")
    public String test() throws InterruptedException {

        Thread.sleep(3000);
        return "success";
    }

But as shown above , before responding it waits for 3 seconds.

Now let’s look at the circuit breaker configuration:

    test2service:
      slidingWindowSize: 10
      permittedNumberOfCallsInHalfOpenState: 4
      slidingWindowType: TIME_BASED
      minimumNumberOfCalls: 2
      waitDurationInOpenState: 20s
      failureRateThreshold: 50
      slowCallRateThreshold: 100
      slowCallDurationThreshold: 2000

I have modified the minimum number of calls (2) in this case.

So after 2 calls the sampling will start.

Rest of the properties are the same except the below two new ones.

slowCallRateThreshold: 100

This value is in percentage and indicates that all the requests will be observed for slow response. If all of them are slow we will trigger circuit breaker.

slowCallDurationThreshold: 2000

This is the duration in milliseconds .

This is the duration for which the system will wait and any value above this means the response is slow.

So if the response is more than 2 seconds , then it is considered as slow.

Notice that the target API which we are invoking has a sleep time of 3 seconds.

So it is slow and circuit breaker should get triggered.

We will see that in the demo.

Use Case 3: Count based Circuit Breaker:

In this case instead of sampling requests for a specific time , will sample specific number of requests.

Let’s use the below API to test this:

  @GetMapping("/test3")
    @CircuitBreaker(name = "test3service", fallbackMethod = "fallback")
    public String test3() {

        return this.restTemplate.getForObject("http://localhost:8080/testing", String.class);

    }

The service /testing does not exist on server localhost.

So it returns 404 error as in the first use case.

This time though we will use the below properties:

    test3service:
      slidingWindowType: COUNT_BASED
      slidingWindowSize: 10
      minimumNumberOfCalls: 5
      failureRateThreshold: 50
      waitDurationInOpenState: 20s
      permittedNumberOfCallsInHalfOpenState: 4

The property names are the same as in time based circuit breaker except that the value for sliding window type which is count based.

So in the above example,

The system will sample 10 requests (slidingWindowSize).

And this will happen only after a minimum of 5 requests are made (minimumNumberOfCalls).

If more than 50 percent of requests fail (failureThreshold) the circuit moves to OPEN state.

It remains in OPEN state for 20 seconds (waitDurationInOpenState).

And then it moves to HALF OPEN state.

It tries 4 requests in this state (permittedNumberOfCallsInHalfOpenState).

If more than 50 percent of the requests fail , then it goes back to OPEN state and no requests are forwarded to the target service /testing.

Else it goes to CLOSED state and requests are forwarded to /testing.

DEMO:

Let’s test the 3 scenarios.

Scenario 1: Time based circuit breaker for network error:

I will make 20 requests within 10 seconds to the REST API /test1.

As you see in the above screenshot,

For the first 20 requests I got the 404 exception printed (from fallback method).

And then the circuit became OPEN and no further requests were send to the target service.( This time circuit open exception is thrown internally)

After 20 seconds I again it the service:

This time 4 requests were made to the target service and 404 exception was thrown again.

Since all the 4 requests failed (above the failure threshold of 50 percent) the circuit moved back to OPEN state as seen in the last 3 request logs above.

We have included actuator endpoint in our application.

So you can go to http://localhost:8080/actuator/circuitbreakers path and see the circuit breaker status as well:

Notice test1service response in the json below.

It is in state OPEN and the last 3 calls were not permitted:

{
  "circuitBreakers": {
    "test1service": {
      "failureRate": "100.0%",
      "slowCallRate": "0.0%",
      "failureRateThreshold": "50.0%",
      "slowCallRateThreshold": "100.0%",
      "bufferedCalls": 4,
      "failedCalls": 4,
      "slowCalls": 0,
      "slowFailedCalls": 0,
      "notPermittedCalls": 3,
      "state": "OPEN"
    },
    "test2service": {
      "failureRate": "-1.0%",
      "slowCallRate": "-1.0%",
      "failureRateThreshold": "50.0%",
      "slowCallRateThreshold": "100.0%",
      "bufferedCalls": 0,
      "failedCalls": 0,
      "slowCalls": 0,
      "slowFailedCalls": 0,
      "notPermittedCalls": 0,
      "state": "CLOSED"
    },
    "test3service": {
      "failureRate": "-1.0%",
      "slowCallRate": "-1.0%",
      "failureRateThreshold": "50.0%",
      "slowCallRateThreshold": "100.0%",
      "bufferedCalls": 0,
      "failedCalls": 0,
      "slowCalls": 0,
      "slowFailedCalls": 0,
      "notPermittedCalls": 0,
      "state": "CLOSED"
    }
  }
}

SCENARIO 2: Time based circuit breaker for slower responses:

Let me hit the API /test2 which invokes the slower API /testing.

Remember the below configuration:

    test2service:
      slidingWindowSize: 10
      permittedNumberOfCallsInHalfOpenState: 4
      slidingWindowType: TIME_BASED
      minimumNumberOfCalls: 2
      waitDurationInOpenState: 20s
      failureRateThreshold: 50
      slowCallRateThreshold: 100
      slowCallDurationThreshold: 2000

So the minimum number of calls required for sampling is only 2.

These 2 requests will return a slow response( 3 seconds), slower than allowed (2 seconds).

So the circuit breaker should OPEN after the 2 requests.

I started the slower API and then made 2 requests within 10 seconds to /test2 API.

Both of them returned success reponse:

But after that the circuit became OPEN and exception was thrown. The fall back method was then invoked as seen from the logs in the console:

This kept getting printed for 20 seconds.

And then the circuit moved to HALF OPEN state.

And then I again I got 2 successful responses but which were slower like before, so circuit moved to OPEN state again .

And the cycle continued.

You can check the actuator end point as before for the latest circuit state.

SCENARIO 3: Count based Circuit Breaker

Let me hit the third REST API /test3 , which is set up for testing count based circuit breaker:

Remember the below configuration:

    test3service:
      slidingWindowType: COUNT_BASED
      slidingWindowSize: 10
      minimumNumberOfCalls: 5
      failureRateThreshold: 50
      waitDurationInOpenState: 20s
      permittedNumberOfCallsInHalfOpenState: 4

So after 5 calls , the circuit should OPEN.

And after 20 seconds , the circuit should go to HALF OPEN state and try 4 requests.

And then it should become OPEN again:

As you can see in the logs , first 5 requests fail.

And then the circuit becomes OPEN and no calls are permitted.

And then after some time the circuit opens partially (after 20 seconds) and tries 4 requests.

Since all of them fail , it goes to OPEN again and stops calling the target service.

The github link for the above code can be found here:

https://github.com/vijaysrj/circuitbreaker

Advantages:

The major advantages of Circuit Breaker design pattern are:

  • Fault tolerance : Your application becomes robust and handles failures gracefully
  • Performance: Since errors are minimized , application performance improves and users get better experience.

Conclusion:

We saw how to use Circuit Breaker design pattern in Spring Boot.

This is one of the different fault tolerance patterns.

This is also part of the series of microservice design patterns.

Check the other patterns here:

Comments

Leave a Reply

Discover more from The Full Stack Developer

Subscribe now to keep reading and get access to the full archive.

Continue reading