Monitor Connect services

Services

Connect consists of several interdependent services that work together to provide the full functionality of the platform.

  • Flow Server - Manages and runs integration flows as the runtime engine.

  • Connect UI and Connect UI Configuration Server - Enable you to manage and monitor deployed integration flows.

  • Insights - Provides integration traces and metrics.

  • Identity and Identity Reconcilers - Manage flow access and enforce flow access authentication.

  • Resource Registry - Stores and serves integration resources such as secrets, schemas, and dynamic tables.

These Connect services are backed by other services:

  • PostgreSQL - Primary persistence provider for all Connect services.

  • MinIO - Side-channel for handling large messages.

  • OpenBao - Manages, stores, and distributes sensitive data for the resource registry.

  • Elasticsearch - Indexes logs and provides integration message trace data to insights.

  • VictoriaMetrics - Provides integration flow metrics.

For Connect to function properly, all services must be running and in a healthy state.

Connect architecture overview
Connect architecture overview

Service Health and Resource Usage

Use the Connect Console and Grafana Dashboards to monitor the health of Connect services. If any service is degraded or unavailable, it can impact the overall health of Connect and its ability to process messages. As a result, the first step in troubleshooting is to verify the health of all services.

All Connect microservices are JVM-based, so memory usage is one of the most important resources to monitor. Memory-related issues often appear as pod restarts caused by Out of Memory (OOM) kills. If you notice a high restart count for any Connect service, investigate the root cause.

Integration traffic is the main driver for resource usage across Connect services. Different aspects of the traffic will impact different services and resources. As traffic patterns change, resource usage and performance will also vary across services. High CPU usage can be expected when the system processes a large volume of messages. However, excessive memory usage is generally a sign of a problem and should be investigated. It is important to understand the type and volume of traffic the Connect cluster is expected to handle, and to configure flows accordingly by applying flow throttling parameters. You can monitor integration traffic over time by using the Connect Dashboard for inflight, buffered, and stashed messages.

In some scenarios, the volume of integration traffic may require increasing allocated memory, but this needs to be verified by analyzing the traffic. If a flow is misconfigured and consuming excessive memory, increasing memory allocation is unlikely to resolve the problem and may at best only mask the underlying issue.

Of all Connect services, integration traffic primarily affects the resource usage of the Flow Server. It also impacts its supporting services.

  • A high number of messages being processed directly affects PostgreSQL and Elastic resources.

  • Large messages that exceed the side-channel threshold impact MinIO resources because they are written to the side-channel.

If any other Connect services are running out of memory or consistently using high CPU, contact Connect support.

GridOS Connect Container Debugging

Starting with Connect version 1.24.0, all Java-based services use Chainguard "distroless" base images, which means there is no shell to drop into if ever needed, for instance during a Root Cause Analysis. If a shell is needed to debug the live service container, it is recommended to use kubectl debug.

The Java based Connect images are:

  • Flow Server

  • Identity

  • Insights

  • Resource Registry

The frontend services rely on NodeJS. In the near future, the frontend NodeJS services will use Chainguard as its base container image.