Troubleshooting
This section helps you identify, diagnose, and resolve common issues in GridOS Connect.
1. Access
-
If you have authentication problems when accessing services, or if your session has timed out, sign in to Zitadel using the IDP configured credentials and then force refresh your service web page.
-
If you have trouble accessing any services, make sure you delete all cookies and local storage/cache. Alternatively, use the incognito mode of your web browser.
2. Load Balancers and Reverse Proxies
You do not require any additional configuration for loadbalancers for Connect. As long as the loadbalancer points to the VMs port 443 and preserves the host in http requests (for application loadbalancers), and all required DNS names are added to point to the loadbalancer (zitadel, service, console, api), the load balancer will work.
3. Flow-traces/Logs
This issue will produce the following error message in the GridOS Connect Console Overview:
co.elastic.clients.elasticsearch._types.ElasticsearchException: [es/search] failed: [index_not_found_exception] no such index [flowserver-logs]
This is a known issue with fluent-bit Operator - after adding new custom resources (clusterInput, clusterFilter and clusterOuput), fluent-bit is not able to fetch flowserver logs.
Assuming flowserver is running as expected and flows have been deployed, but flow traces are not visible in the "Flow traces" tab in GridOS Connect console, then it is safe to assume that fluent-bit is not able to fetch logs for flowserver.
Workaround:
-
Restart fluent-bit pods
kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit -
Restart flowserver pods
kubectl delete po -n foundation-env-default -l app.kubernetes.io/name=flowserver
4. Prevent Log Duplication of Connect Flow Server
The default Foundation fluent-bit clusterOutput configuration leads to log duplication across Elasticsearch data streams. Each log event is stored multiple times, which increases storage usage and might impact query performance.
Specifically, for each log event indexed as a document in Elasticsearch from Connect flow-server or other Connect services, the following happens:
-
Two copies are stored in the
logsdata stream if the log event originates from any Connect service, including flow-server. This happens because the default Foundation fluent-bit configuration uses a catch-allclusterInputthat collects logs from all containers, including Connect services, and sends them to thelogsdata stream in Elasticsearch. -
One copy is stored in the
flowserver-logsdata stream if the log event originates from the Connect flow-server. -
One copy is stored in the
connect-servicesdata stream if the log event originates from a Connect service other than flow-server.
As a result, each Connect-related log event can be stored up to three times.
Solution:
Since Connect flow-server logs only need to be retained in the flowserver-logs data stream, and other Connect services logs in the connect-services data stream, exclude these logs from the logs data stream. This reduces redundancy and optimizes storage.
The Foundation clusterInput exclusion prevents the catch-all Foundation log collector from reading Connect container logs, while the catch-all Elasticsearch clusterOutput update ensures Connect logs are not written to the default logs data stream.
Apply both changes below:
-
Exclude Connect logs from the default Foundation clusterInput:
Ensure that the Foundation Helm overlay values have been used to redeploy the Foundation
monitoring-appsHelm release in your environment. This configuration adds theconnect-pattern to theextraExcludedContainersfield of the catch-all FoundationclusterInput. This prevents log collection for all Connect containers, including flow-server, in the defaultlogsdata stream.... fluent-bit: enabled: true fluentBit: input: extraExcludedContainers: "*connect-*" ... -
Update catch-all Elasticsearch clusterOutput:
Assuming the
clusteroutput.fluentbit.fluent.io/elasticsearchresource has not been modified outside from default Foundation configuration.-
Fetch elasticsearch clusterOutput configuration
kubectl get clusteroutputs elasticsearch -o yaml -
Modify elasticsearch clusterOutput the configuration
kubectl get clusteroutput elasticsearch -o yaml \ | yq e ' del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration") | del(.metadata.generation) | del(.metadata.resourceVersion) | del(.metadata.uid) | del(.metadata.creationTimestamp) ' - \ | yq e ' .spec |= ( select(has("match")) | .matchRegex = "^(?!flowserver.*)(?!connect.*).*" | del(.match) ) ' - -
Compare the outputs Compare and verify that the output from steps 1 and 2 matches the table below. The match field should be replaced with matchRegex.
Original Modified apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterOutput metadata: annotations: meta.helm.sh/release-name: monitoring-apps meta.helm.sh/release-namespace: foundation-cluster-monitoring labels: app.kubernetes.io/managed-by: Helm fluentbit.fluent.io/enabled: "true" name: elasticsearch spec: es: generateID: true host: monitoring-apps-es-client index: logs logstashFormat: false port: 9200 replaceDots: true suppressTypeName: "On" timeKey: '@timestamp' traceError: false traceOutput: false type: _doc match: '*' retry_limit: "False"apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterOutput metadata: annotations: meta.helm.sh/release-name: monitoring-apps meta.helm.sh/release-namespace: foundation-cluster-monitoring labels: app.kubernetes.io/managed-by: Helm fluentbit.fluent.io/enabled: "true" name: elasticsearch spec: es: generateID: true host: monitoring-apps-es-client index: logs logstashFormat: false port: 9200 replaceDots: true suppressTypeName: "On" timeKey: '@timestamp' traceError: false traceOutput: false type: _doc matchRegex: ^(?!flowserver.*)(?!connect.*).* retry_limit: "False"If your output is consistent with the table above, use the following commands to exclude Connect-related tags in the
elasticsearchclusterOutput. -
Create a YAML file for the modified clusterOutput resource
kubectl get clusteroutput elasticsearch -o yaml \ | yq e ' del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration") | del(.metadata.generation) | del(.metadata.resourceVersion) | del(.metadata.uid) | del(.metadata.creationTimestamp) ' - \ | yq e ' .spec |= ( select(has("match")) | .matchRegex = "^(?!flowserver.*)(?!connect.*).*" | del(.match) ) ' - \ > modified-clusteroutput-es.yaml -
Delete the old clusterOutput resource
kubectl delete clusteroutput elasticsearch -
Apply the modified clusterOutput resource
kubectl apply -f modified-clusteroutput-es.yaml -
Restart fluent-bit pods
kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit
-
5. Connect-PostgreSQL Fails to Install
If you are installing connect on a Foundation version lower than 25R01, the connect-postgresql Helm chart will fail with the following error:
Error: UPGRADE FAILED: cannot patch "connect-postgresql" with kind postgresql: postgresql.acid.zalan.do "connect-postgresql" is invalid: spec.postgresql.version: Unsupported value: "17": supported values: "10", "11", "12", "13", "14", "15"
Resolution Options:
-
Upgrade Foundation to 25R01+
Upgrade your environment to Foundation version 25R01 or higher and then run apply-helmfile to redeploy the GridOS connect-postgresql with the latest supported version.
-
Downgrade connect-postgresql chart
Edit helmfile.yaml.gotmpl in the root directory of the Helm Deployment Template.
-
Ensure that the connect-postgresql chart version is set to 0.5.0, and run
apply-helmfileto redeploy the GridOS connect-postgresql.
... - name: connect-postgresql <<: *defaults chart: connect-helm/connect-postgresql version: 0.5.0 values: - ./connect-postgresql/values.yaml # - ./offline-image-overrides.yaml ...
6. connect-identityreconciler Pod Fails to Start Due to Renamed APISIX Secret
Connect release 1.20.0 includes updates to support changes in APISIX secrets introduced in Foundation version 25r09.
Symptom:
The connect-identityreconciler pod may fail to start and log one of the following errors:
-
When deploying Connect from scratch:
MountVolume.SetUp failed for volume "secrets": secret "connect-identityreconciler-apisix-api-token" not found -
When upgrading Connect:
Getting upstream 'connect-flowserver-service' failed with status: 401 Unauthorized
Root Cause:
In Foundation version 25r09, an APISIX secret was renamed. Connect relies on this secret, so if Connect and Foundation are on mismatched versions, the secret name does not align. This can cause the pod to fail at startup or result in service requests being rejected.
Compatibility Overview: The following matrix shows the required actions based on the versions of Foundation and Connect:
| Foundation version | Connect version | Action |
|---|---|---|
|
|
No action required; the setup is fully compatible. |
Older than |
|
Apply the override described in Solution. |
|
Older than |
Apply the override described in Solution. |
Solution:
In connect/values.yaml, add the following override under the identityreconciler block:
Foundation version older than |
Foundation version |
|
|