Sizing and Scaling Guidelines

This page provides a practical starting point for sizing and scaling GridOS clusters running Foundation and Connect. Use these values as baseline guidance, then validate and tune with workload-specific performance tests.

Scaling Model

For best performance, use low-latency storage designed for IOPS-intensive and throughput-intensive workloads, such as Amazon EBS Provisioned IOPS SSD volumes.

The two main sizing drivers are message throughput and message size, and the general guideline is as follows:

Use horizontal scaling (more pods/nodes) for high-concurrency, high-throughput traffic.
Use vertical scaling (more CPU/memory per pod/node) for large payloads and memory-heavy processing.

This page is split into two parts:

General sizing guideline for small, medium, and large baseline environments.
Scenario-specific sizing guideline for workload tuning.

Configuration Ownership: Foundation vs Connect Installation

Configure During Components

Configure During	Components
Foundation installation	Kubernetes cluster/node sizing and Foundation-managed services such as Elasticsearch.
Connect installation	Connect services and Connect third-party charts, including `connect-postgresql`, `connect-openbao`, and `connect-victoria-metrics`.

Foundation installation

Kubernetes cluster/node sizing and Foundation-managed services such as Elasticsearch.

Connect installation

Connect services and Connect third-party charts, including connect-postgresql, connect-openbao, and connect-victoria-metrics.

1) General Sizing Guideline (Small / Medium / Large)

The following table outlines recommended Kubernetes cluster baselines based on anticipated hourly message volumes in Connect. At least four nodes are recommended: three control plane nodes, with worker capacity available across the cluster. For production environments, prefer dedicated control-plane nodes separated from workload nodes; co-locating workloads on control-plane nodes can work, but it is generally a cost-driven exception rather than the preferred high-availability layout.

Cluster Size	Hourly Message Volume (~2 KB avg)	Node Count	vCPUs per Node	Memory (GiB) per Node
Small	Up to 200,000	4	8	32
Medium	Up to 1,000,000	4	16	64
Large	Up to 5,000,000	4	32	128

As a sizing rule of thumb, set memory requests equal to memory limits for JVM-based services to keep heap behavior predictable. For non-JVM services, start with lower memory requests than limits and tune them based on observed usage. For CPU, set requests to expected steady-state demand and limits to expected peak demand.

Elasticsearch Baseline (Foundation Installation)

The following values apply to Elasticsearch dataHot nodes used by Connect workloads. CPU request values are practical starting points (not product-mandated values) and should be tuned from observed indexing/search load.

Cluster Size	Replicas	CPU Request (vCPUs)	CPU Limit (vCPUs)	Memory Request (GiB)	Memory Limit (GiB)	Storage (GiB) per Node
Small	3	2	2	6	6	50
Medium	3	4	4	12	12	200
Large	3	8	8	24	24	600

MinIO Baseline (Foundation Installation)

MinIO is deployed and managed by Foundation (MinIO Operator + MinIO Tenant). Connect uses MinIO as side-channel storage for large messages and as a backup target for PostgreSQL and OpenBao.

Size MinIO Tenant storage based on expected large message volume and backup retention requirements. Configure MinIO sizing during Foundation installation.

Connect Services Baseline (Connect Installation)

For Connect installation sizing, prioritize tuning Flow Server and PostgreSQL. In most environments, the shipped chart defaults for other Connect services and third-party services are sufficient and can remain unchanged unless monitoring shows sustained resource pressure.

1. PostgreSQL Baseline

PostgreSQL is deployed in HA mode (replica: 3) and can be tuned vertically by adjusting CPU and memory requests and limits.

Cluster Size	Replicas	CPU Request (vCPUs)	CPU Limit (vCPUs)	Memory Request (GiB)	Memory Limit (GiB)	Storage per Server (GiB)
Small	3	1	2	2	4	5
Medium	3	2	4	4	8	20
Large	3	4	8	8	16	50

2. Flow Server Baseline

Flow server is the only Connect service that supports horizontal scaling, all others (e.g., identity services, insights, resource registry, deployment operator and frontend services) should remain at replicas: 1 and be tuned vertically.

Cluster Size	`replicas`	CPU Request (vCPUs)	CPU Limit (vCPUs)	Memory Request (GiB)	Memory Limit (GiB)	`mc.flow-server.maxNumberOfShards`
Small	3	1	1	2	2	30
Medium	5	2	2	4	4	50
Large	7	3	3	8	8	70

maxNumberOfShards should be set before scaling and must not be decreased on a running cluster, as it would affect the assignment of existing shards and can lead to data inconsistencies or flow execution errors.

When scaling flow server horizontally, keep the replica count odd to reduce the risk of split-brain scenarios during leader election. For example, if you would otherwise scale to 6 replicas, consider scaling to 7 instead.

2) Scenario-Specific Sizing Guideline

Use these scenarios to adjust the general baseline for your workload. Apply the listed changes in priority order.

Scenario Foundation Installation Changes Connect Installation Changes

Scenario	Foundation Installation Changes	Connect Installation Changes
High-Throughput and Small Messages	Scale worker-node CPU first; verify Elasticsearch indexing CPU and storage IOPS.	Scale flow server horizontally first; keep `maxNumberOfShards = 10 x replicas`; tune flow server CPU before memory.
Large-Payload Integrations	Scale node memory first; verify Elasticsearch memory/storage throughput and MinIO Tenant storage capacity.	Scale flow server vertically (memory first, then CPU); verify MinIO storage/throughput because payloads above the `largeMessages.byteThreshold` (default `100 KB`) are side-channeled to MinIO; increase PostgreSQL and OpenBao memory only if observed pressure requires it.
Mixed Workloads	Balance node CPU and memory; monitor Elasticsearch indexing latency and disk saturation.	Combine moderate flow server horizontal scaling with moderate vertical scaling across flow server, PostgreSQL, and Insights.
Burst / Peak Traffic	Size nodes and Elasticsearch for peak windows, not average traffic; validate storage burst behavior.	Maintain flow server replica headroom; verify PostgreSQL CPU/memory and WAL (write-ahead log) storage growth during peak periods.

High-Throughput and Small Messages

Scale worker-node CPU first; verify Elasticsearch indexing CPU and storage IOPS.

Scale flow server horizontally first; keep maxNumberOfShards = 10 x replicas; tune flow server CPU before memory.

Large-Payload Integrations

Scale node memory first; verify Elasticsearch memory/storage throughput and MinIO Tenant storage capacity.

Scale flow server vertically (memory first, then CPU); verify MinIO storage/throughput because payloads above the largeMessages.byteThreshold (default 100 KB) are side-channeled to MinIO; increase PostgreSQL and OpenBao memory only if observed pressure requires it.

Mixed Workloads

Balance node CPU and memory; monitor Elasticsearch indexing latency and disk saturation.

Combine moderate flow server horizontal scaling with moderate vertical scaling across flow server, PostgreSQL, and Insights.

Burst / Peak Traffic

Size nodes and Elasticsearch for peak windows, not average traffic; validate storage burst behavior.

Maintain flow server replica headroom; verify PostgreSQL CPU/memory and WAL (write-ahead log) storage growth during peak periods.

Implementation Notes

Configure Kubernetes capacity and Foundation-managed services during Foundation installation.
Configure Connect services and Connect-managed third-party charts during Connect installation.
Define CPU and memory requests for all workloads; define limits according to your runtime policy and test results.
Validate under representative load before production rollout.