Visibility is a key requirement when working with containers and microservices, especially in complex environments where it’s hard to keep track of everything that’s going on. Distributed tracing is a method used to monitor applications in a way that the chain of events can be tracked across different processes, hosts or proxies. This is done with the help of graphical representations that allow a user who is reviewing them to be able to point out any bottlenecks, latency issues or anomalies.
Unlike monitoring that’s primarily diagnostic, or logging that’s focused on specific instances, tracing is about getting to the finer details and virtually following the footsteps of a single user through the entire app stack. Tracing works by following the footsteps of every request and the path it takes to completion, which is also referred to as a transaction or a trace. A trace is made up of spans, which refer to the actual work being performed at each step in the transaction. In addition to each trace being uniquely identified, each span is also given a unique ID, as well as the ability to create child spans.
While every function should ideally be pretty straightforward to trace, it can be a rather expensive endeavor thanks to the huge amounts of data involved. This is why we’re going to take a quick look at the top 4 open-source tracing tools and how they help us gain visibility into the complex world of containers and microservices.
1. OpenTelemetry
OpenTelemetry is born from two powerful observability standards whose only drawback was the fact that there were two of them. The CNCF’s OpenTracing and Google’s OpenConsensus have been leading the way in terms of tracing and gathering API metrics. While both decided to go different routes in terms of architecture, they have pretty similar functionalities.
The main difference between the two is that OpenCensus also includes language implementations and wire protocol, and goes beyond the “traditional” scope tracing by including additional metrics. OpenTelemetry is the joint effort of both organizations to merge these two prevalent open-source standards into one single standard for observability in the enterprise. OpenTelemetry is now a CNCF Sandbox project.
OpenTelemetry implementations are currently in pre-release status, however, a presentation on November 21st at KubeCon did cover a lot of what users can expect. OpenTelemetry is composed of five primary components, which are client libraries in multiple languages, integrations for other libraries and frameworks, exporters for APMs and a standalone collector. The last component consists of specifications, data formats, and semantic conventions. OpenTelemetry will also include multiple integrations and extensions like W3C-HTTP and gRPC.
2. Jaeger
Primarily built to run in Kubernetes, Jaeger was originally developed by Uber and was later donated to the CNCF. Jaeger implements the OpenTracing standard to help organizations monitor microservice-based distributed systems with features like root cause analysis and distributed transaction monitoring. Jaeger supports five languages as of now, which are C#, Go, Node.js, Python, and Java. It also ships with a number of repositories that offer instrumentation for different frameworks like Django, Dropwizard and the Go standard library.
Jaeger is composed of five major components, which are client libraries, agent, collector, query and ingester. Unlike other distributed tracing systems, Jaeger needs to be set up inside Kubernetes as a daemon set, while storage that can be queried needs to be set up separately. Cassandra and ElasticSearch are popular choices to store data collected by Jaeger. This ability to have multiple storage backends makes Jaeger highly scalable and avoids any single point of failure. The success of Kubernetes has made Jaeger quite a popular choice for tracing.
3. Zipkin
Unlike Jaeger, that’s made up of five primary components, Zipkin is one single process, making deployment a lot simpler. Originally developed by Twitter, Zipkin is implemented in Java, has an OpenTracing compatible API, and supports almost every programming language and development platform available today. Also unlike Jaeger, which is more Kubernetes-centric in its approach, Zipkin relies heavily on Docker. Zipkin provides Docker images and java programs and even has a Docker project that can be deployed with a single command.
While the fact that all components are self-contained in a single process does make deployment simpler, additional configuration is required for applications to report back to Zipkin. These include the use of collectors like HTTP, Kafka, and Scribe, as well as storage like MySQL, Cassandra, and ElasticSearch. Zipkin also has a lot of support from its open-source community called OpenZipkin, where new APIs, formats, and libraries are frequently published.
4. AppDash
Built to provide a simple way to trace and troubleshoot issues on large websites, AppDash is inspired by Dapper and Zipkin and supports the OpenTracing standard. AppDash is made up of a Go library that records performance and a web-based UI. Following the same architectural concepts of Google’s Dapper, AppDash uses spans to compose a tree of all operations that occur, and events to categorize them in Go types. It then uses a Recorder to send events to a Collector, which can either be an external collector like HTTP or Kafka, or a local persistent collector.
AppDash is open source, written in Golang and supports Python, Ruby and Golang implementations. AppDash is also used to monitor Sourcegraph, a self-contained code search and navigation tool.
There are Levels to This Game
Monitoring microservices is hard work, and while popular service mesh tools like Istio do help us dial things up a notch, they still leave a lot to be desired in terms of observability and security. Gaining real-time information into a service mesh’s communications is impossible without using some of the tracing tools we discussed above. Additionally, open-source secrets management and security solutions like CyberArk Conjur, help secure containers and address the security issues with service mesh technology.
Join the Conversation on the CyberArk Commons
If you’re interested in this and other open-source content, join the conversation on the CyberArk Commons Community. Secretless Broker, Conjur and other open-source projects are a part of the CyberArk Commons Community, an open community dedicated to developers, engineers, cybersecurity researchers, and other technically-minded people. To discuss Kubernetes, Secretless Broker, Conjur, CyberArk Threat Research, join me on the CyberArk Commons discussion forum.
Twain is a Fixate IO Contributor and began his career at Google, where, among other things, he was involved in technical support for the AdWords team. His work involved reviewing stack traces, and resolving issues affecting both customers and the Support team, and handling escalations. Later, he built branded social media applications, and automation scripts to help startups better manage their marketing operations. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications.