2021-3-15

GCP Spans in Cloud Run

I've been working on a Cloud Run based API for a little bit now. Cloud Run is nice because it has a very generous free tier and it also costs nothing if it's not being used. One thing that Cloud Run provides by default is Cloud Tracing. Unfortunately, the default traces aren't so useful:

It just shows the total time for a request without any detail. I recently learned about distributed tracing and know that it's possible to get much better observability and I just had to figure out how.

The main things that I'm doing in my request handler are using firebase auth to verify JWT tokens as well as connecting to Datastore to do a Put operation. I know since those are both using Google GCP libaries, they have access to Spans and I just have to link them to the parent Span created by Cloud Run.

How do you do that?

It turns out GCP automatically adds a header, X-Cloud-Trace-Context that gives the information needed to link Spans that you generate with the trace generated by default by Cloud Run. The Google libaries also provides a helpful library function to grab this header and create a span with that as the parent (adapted from stackoverflow).

httpFormat := &propagation.HTTPFormat{}
sc, ok := httpFormat.SpanContextFromRequest(r)
if ok {
    spanContext, span = trace.StartSpanWithRemoteParent(ctx, "Echo Server", sc,
        trace.WithSampler(trace.AlwaysSample()),
        trace.WithSpanKind(trace.SpanKindServer),
    )
    defer span.End()
}

With this new spanContext, I can pass this through the various GCP libraries and when they make a new Span to indicate their operations, they will see that the context contains a Span already and thus the new Span will be a child Span of the span amde in the above code. I want to make sure I always do this tricky on every request, so I add some middleware to my Echo Server:

func spanContext(next echo.HandlerFunc) echo.HandlerFunc {
	return func(c echo.Context) error {
		httpFormat := &propagation.HTTPFormat{}
		sc, ok := httpFormat.SpanContextFromRequest(c.Request())
		if !ok {
			log.Print("no span found")
		}

		spanContext, span := trace.StartSpanWithRemoteParent(c.Request().Context(), "Echo Server", sc,
			trace.WithSpanKind(trace.SpanKindServer))
		c.Set("spanContext", spanContext)
		defer span.End()
		return next(c)

	}
}

With spanContext added to all my requests, I can simply grab the spanContext from the echo.Context and pass it through to all the relevant GCP libraries and also use it as the basis for any of my own spans Here's what my simple example kind of looks like graphics:

A lot better than before! I feel like this can definitely come in handy for diagnosing issues in the future.

P.S. Make sure to import "contrib.go.opencensus.io/exporter/stackdriver/propagation"

P.P.S. Looks like this has already helped out. I was creating a new firebase auth client for each request (which would then grab the public keys). I was also creating a new datastore client for each request, which I also don't need to do. Thanks for cloud tracing I was able to pin point these things down and fix it up. Updated trace below:


Any errata or comments can be made by sending me a pull request.

Back