Monitor Connect services

Services

Connect consists of several interdependent services that work together to provide the full functionality of the platform.

  • Insights - Provides integration traces and metrics.

  • Identity + Identity Reconcilers - Manage flow access and enforce flow access authentication.

  • Resource Registry - Stores and serves integration resources such as secrets, schemas, and dynamic tables.

These Connect services are backed by other services:

  • PostgreSQL - Primary persistence provider for all Connect services, including flow-server.

  • MinIO - Side-channel for handling large messages.

  • OpenBAO - Manages, stores, and distributes sensitive data for the resource registry.

  • Elastic - Indexes logs and provides integration message trace data to insights.

  • VictoriaMetrics - Provides integration flow metrics.

For Connect to function properly, all services must be running and in a healthy state.

Connect architecture overview
Connect architecture overview

Service Health and Resource Usage

Use the Connect console and Grafana Dashboards to monitor the health of Connect services. If any service is degraded or unavailable, it can impact the overall health of Connect and its ability to process messages. As a result, the first step in troubleshooting is to verify the health of all services.

All Connect microservices are JVM-based, so memory usage is one of the most important resources to monitor. Memory-related issues often appear as pod restarts caused by Out of Memory (OOM) kills. If you notice a high restart count for any Connect service, investigate the root cause.

Integration traffic is the main driver for resource usage across Connect services. Different aspects of the traffic will impact different services and resources. As traffic patterns change, resource usage and performance will also vary across services. High CPU usage can be expected when the system processes a large volume of messages. However, excessive memory usage is generally a sign of a problem and should be investigated. It is important to understand the type and volume of traffic the Connect cluster is expected to handle, and to configure flows accordingly by applying flow throttling parameters. You can monitor integration traffic over time by using the Connect Dashboard for inflight, buffered, and stashed messages.

In some scenarios, the volume of integration traffic may require increasing allocated memory, but this needs to be verified by analyzing the traffic. If a flow is misconfigured and consuming excessive memory, increasing memory allocation is unlikely to resolve the problem and may at best only mask the underlying issue.

Of all Connect services, integration traffic primarily affects the resource usage of the Flow Server. It also impacts its supporting services.

  • A high number of messages being processed directly affects PostgreSQL and Elastic resources.

  • Large messages that exceed the side-channel threshold impact MinIO resources because they are written to the side-channel.

If any other Connect services are running out of memory or consistently using high CPU, contact Connect support.