[gRPC] Understanding gRPC Request Retries

This article provides a brief overview of gRPC's request retry capabilities and how to implement them in Go.

Today, we delve deeper into another practical feature of gRPC—the request retry mechanism. In real-world applications, we often encounter scenarios where we need to retry failed requests. Typically, this would require the use of additional libraries. Fortunately, gRPC has built-in mechanisms that eliminate the need to write retry logic from scratch.

According to the official gRPC documentation, we can set policies to control the retry behavior of failed RPCs. gRPC offers two strategies for retries:

  1. Retry Strategy: The failed RPC requests are retried.
  2. Hedging Strategy: Multiple identical RPC requests are sent in parallel without waiting for a response.

It's important to note that a single RPC request can only choose one of these strategies for retry. Here are some configurable parameters for the retry strategy:

  • maxAttempts: A required field, indicating the maximum number of RPC attempts, including the original request.
  • initialBackoff, maxBackoff, backoffMultiplier: Required fields that determine the delay before the next retry attempt, calculated using the formula random(0, initialBackoff * backoffMultiplier^(n-1), maxBackoff).
  • retryableStatusCodes: A required field, when the server returns non-normal status codes that are on the list, it decides whether to retry the request based on these status codes.

The hedging strategy allows for the proactive sending of multiple copies of a single request without waiting for a response. Note that this strategy may lead to multiple executions on the backend, so it is recommended to enable it only for requests that can be safely executed multiple times without adverse effects. The hedging strategy parameters are as follows:

  • maxAttempts: A required field, the maximum number of attempts.
  • hedgingDelay: An optional field, if no successful response is received after hedgingDelay, the request will continue to be sent until the maxAttempts maximum number is reached or the request succeeds.
  • nonFatalStatusCodes: An optional field, even if these status codes are received, it will not stop the continued sending of requests.

Next, we will demonstrate how to configure and use these retry features in gRPC with the Go language version.

Server-Side Implementation

First, we will create a server that only returns a successful response on the third request attempt.

package main

import (
	"context"
	"flag"
	"fmt"
	"log"
	"net"
	"sync"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/codes"
	"google.golang.org/grpc/status"
	pb "github.com/overstarry/grpc-example/proto/echo"
)

var port = flag.Int("port", 9000, "port number")

type failingServer struct {
	sync.Mutex
	reqCounter   uint
	reqModulo    uint
}

func (s *failingServer) maybeFailRequest() error {
	s.Lock()
	defer s.Unlock()
	s.reqCounter++
	if s.reqModulo > 0 && s.reqCounter%s.reqModulo == 0 {
		return nil
	}
	return status.Errorf(codes.Unavailable, "maybeFailRequest: failing it")
}

func (s *failingServer) UnaryEcho(ctx context.Context, req *pb.EchoRequest) (*pb.EchoResponse, error) {
	if err := s.maybeFailRequest(); err != nil {
		log.Println("request failed count:", s.reqCounter)
		return nil, err
	}
	log.Println("request succeeded count:", s.reqCounter)
	return &pb.EchoResponse{Message: req.Message}, nil
}

func main() {
	flag.Parse()
	address := fmt.Sprintf(":%v", *port)
	lis, err := net.Listen("tcp", address)
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}
	fmt.Println("listen on address", address)
	s := grpc.NewServer()
	failingService := &failingServer{reqCounter: 0, reqModulo: 3}
	pb.RegisterEchoServer(s, failingService)
	if err := s.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}
}

Client-Side Configuration

On the client side, we configure the retry functionality using WithDefaultServiceConfig.

package main

import (
	"context"
	"flag"
	"log"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	pb "github.com/overstarry/grpc-example/proto/echo"
)

var addr = flag.String("addr", "127.0.0.1:9000", "the address to connect to")

// Service config with retry settings
retryPolicy := `{
	"methodConfig": [
		{
			"name": [
				{
					"service": "grpc.examples.echo.Echo",
					"method": "UnaryEcho"
				}
			],
			"waitForReady": true,
			"retryPolicy": {
				"MaxAttempts": 4,
				"InitialBackoff": ".01s",
				"MaxBackoff": ".01s",
				"BackoffMultiplier": 1.0,
				"RetryableStatusCodes": ["UNAVAILABLE"]
			}
		}
	]
}`

func retryDial() (*grpc.ClientConn, error) {
	return grpc.Dial(*addr, grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithDefaultServiceConfig(retryPolicy))
}

func main() {
	flag.Parse()
	conn, err := retryDial()
	if err != nil {
		log.Fatalf("did not connect: %v", err)
	}
	defer func() {
		if err := conn.Close(); err != nil {
			log.Printf("failed to close connection: %s", err)
		}
	}()

	c := pb.NewEchoClient(conn)
	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
	defer cancel()
	reply, err := c.UnaryEcho(ctx, &pb.EchoRequest{Message: "Try and Succeed"})
	if err != nil {
		log.Fatalf("UnaryEcho error: %v", err)
	}
	log.Printf("UnaryEcho reply: %v", reply)
}

Summary

This article provides a brief overview of gRPC's request retry capabilities and how to implement them in Go. The related code can be found in the grpc-example repository on GitHub. For more detailed information about gRPC request retries, you can refer to the following links:

With these resources, you can gain a deeper understanding of gRPC's request retry mechanism and effectively utilize it in your projects to improve system robustness and reliability.