Metrics and Dashboards

Connect services publish a range of metrics that are available in the Grafana dashboards included with Connect and Foundation. For more information about the Grafana dashboards provided with Foundation, see the Monitoring section of the Foundation Base documentation.

You can access the Foundation Grafana instance at https://admin.YOUR_DOMAIN/monitoring/grafana.

Having these dashboards available is helpful when working with Connect support to investigate performance issues or service instability.

Connect Dashboard

The Connect dashboard displays both business and technical metrics. It provides an overview of message processing activity, organized by flow. These metrics help you identify potential integration issues, such as a high number of failed messages, and understand the system load for the selected time period.

connect message stats
Connect message processing statistics, color-coded by flow

Inflight, Buffered, and Stashed Messages

Three panels display inflight, buffered, and stashed messages. These panels are key tools for monitoring integration traffic and can help identify potential flow misconfigurations.

Connect persists all incoming messages. A limited number of messages are processed concurrently, while additional messages are buffered in memory. When the buffer reaches its limit, persisted messages are no longer held in memory. These are referred to as stashed messages. Stashed messages must be read back from persistence once buffer space becomes available before they can be processed.

A consistently high inflight message count is not always a problem, but it may indicate inefficient configuration. For large messages in particular, high inflight counts can consume significant memory and should be monitored closely.

If the buffer remains full for extended periods, it could indicate a configuration issue. Buffered messages are kept in memory to handle short bursts of load without reading from persistence. For extended periods of high loads, the buffer can fill up, causing messages to move to the stash. In this situation, the buffer continues to consume memory, but you will have to refill the buffer periodically from stash. This effectively removes the performance gain you get by keeping messages in memory.

If you observe consistently high numbers of both buffered and stashed messages, review your buffer configuration. In some cases, reducing the buffer size may improve overall performance.

connect inflight messages
Overview of inflight, buffered and stashed messages during a period with high incoming traffic

Processing and Persistence Time

The panels displaying processing and persistence times are useful for identifying performance issues. When Connect is under heavy load, the persistence provider can sometimes become the main bottleneck. Monitoring average persistence time can help determine whether this is the case.

If processing time per message is high, it may indicate problematic processors, for example, a suboptimal map implementation.

connect persistence time
Message persistence time

Flow Server General

This section provides basic JVM metrics along with side-channel metrics. You can use these metrics to identify unhealthy memory usage patterns, and gain insight into the messages that are side-channeled, as well as the performance of those side-channels.

Mailbox Metrics

Mailbox metrics are primarily intended for debugging flow-server internals and are mainly used when troubleshooting with the Connect engineering team.

JVM Metrics

All microservices deployed as part of Connect expose standard JVM metrics, including memory usage, CPU utilization, and garbage collection statistics.

These metrics are available in the Spring Boot 2.3 Statistics dashboard included with Foundation. To view statistics for a specific Connect microservice, select the job that corresponds to that service (for example, connect-flowserver). Note that you can view only one instance or pod at a time.

While this dashboard provides detailed insight into JVM memory and garbage collection activity, it does not display the usage patterns or business metrics that influence them. They are helpful for investigating performance or stability issues when correlated with metrics from the Connect dashboard.

JVM metrics for a Connect flow-server instance
First part of the Spring Boot dashboard displays metrics for a Connect flow-server instance

Kubernetes Dashboards

You can access several Kubernetes dashboards in the GridOS Grafana installation.

The Kubernetes dashboards provide cluster-wide, infrastructure-focused metrics that support resource management. You can view resource usage across all containerized workloads and use the dashboards to assess how workloads utilize their resources, whether they stay within their allocated resource limits, and when resource utilization occurs at the Kubernetes node level.

These dashboards do not explain why certain metrics, such as memory consumption is high or what drives CPU usage. For deeper insights into Connect service performance, use the JVM metrics.

CPU metrics for the connect-flowserver deployment
First part of the "Kubernetes - Compute Resource - Workload" dashboard displays CPU metrics for the connect-flowserver deployment