Troubleshooting
This section helps you identify, diagnose, and resolve common issues in GridOS Connect.
1. Access
-
If you have authentication problems when accessing services, or if your session has timed out, sign in to Zitadel using the IDP configured credentials and then force refresh your service web page.
-
If you have trouble accessing any services, make sure you delete all cookies and local storage/cache. Alternatively, use the incognito mode of your web browser.
2. Load Balancers and Reverse Proxies
You do not require any additional configuration for loadbalancers for Connect. As long as the loadbalancer points to the VMs port 443 and preserves the host in http requests (for application loadbalancers), and all required DNS names are added to point to the loadbalancer (zitadel, service, console, api), the load balancer will work.
3. Flow-traces/Logs
This issue will produce the following error message in the GridOS Connect Console Overview:
co.elastic.clients.elasticsearch._types.ElasticsearchException: [es/search] failed: [index_not_found_exception] no such index [flowserver-logs]
This is a known issue with Fluentbit Operator - after adding new custom resources (ClusterInput, ClusterFilter and ClusterOuput), Fluentbit is not able to fetch flowserver logs.
Assuming flowserver is running as expected and flows have been deployed, but flow traces are not visible in the "Flow traces" tab in GridOS Connect console, then it is safe to assume that fluentbit is not able to fetch logs for flowserver.
Workaround:
-
Restart fluentbit pods
kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit -
Restart flowserver pods
kubectl delete po -n foundation-env-default -l app.kubernetes.io/name=flowserver
4. Prevent Log Duplication of Connect Flow Server
The default Foundation Fluent-bit ClusterOutput configuration leads to log duplication across Elasticsearch indices. Each log event is stored multiple times, which increases storage usage and might impact query performance.
Specifically:
-
Each log event is indexed as a document in Elasticsearch.
-
Two copies is stored in the
logindex. -
One copy is stored in the
flowserver-logindex.
As a result, each log event is stored three times in total.
Solution:
Since Connect flowserver logs only needs to be retained in the flowserver-log index, they can be removed from the log index to reduce redundancy and optimize storage.
Update fluent-bit configuration
Assuming the clusteroutput.fluentbit.fluent.io/elasticsearch resource has not been modified outside from default Foundation configuration.
-
Fetch elasticsearch ClusterOutput configuration
kubectl get clusteroutputs elasticsearch -o yaml -
Modify elasticsearch ClusterOutput the configuration
kubectl get clusteroutput elasticsearch -o yaml | yq e 'del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration") | del(.metadata.generation) | del(.metadata.resourceVersion) | del(.metadata.uid) | del(.metadata.creationTimestamp)' - | yq e '.spec |= (select(has("match")) | .matchRegex = "^(?!flowserver.*).*" | del(.match))' -
Compare the outputs Compare and verify that that output from step 1. and 2. matches the table below. The match field should be replaced with matchRegex.
Original Modified apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterOutput metadata: annotations: meta.helm.sh/release-name: monitoring-apps meta.helm.sh/release-namespace: foundation-cluster-monitoring labels: app.kubernetes.io/managed-by: Helm fluentbit.fluent.io/enabled: "true" name: elasticsearch spec: es: generateID: true host: monitoring-apps-es-client index: logs logstashFormat: false port: 9200 replaceDots: true suppressTypeName: "On" timeKey: '@timestamp' traceError: false traceOutput: false type: _doc match: '*' retry_limit: "False"apiVersion: fluentbit.fluent.io/v1alpha2 kind: ClusterOutput metadata: annotations: meta.helm.sh/release-name: monitoring-apps meta.helm.sh/release-namespace: foundation-cluster-monitoring labels: app.kubernetes.io/managed-by: Helm fluentbit.fluent.io/enabled: "true" name: elasticsearch spec: es: generateID: true host: monitoring-apps-es-client index: logs logstashFormat: false port: 9200 replaceDots: true suppressTypeName: "On" timeKey: '@timestamp' traceError: false traceOutput: false type: _doc matchRegex: ^(?!flowserver.*).* retry_limit: "False"If your output is consistent with the table above. Use the following commands below to exclude flowserver logs in the "elasticsearch" ClusterOutput.
-
Create a YAML file for the modified ClusterOutput resource
kubectl get clusteroutput elasticsearch -o yaml | yq e 'del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration") | del(.metadata.generation) | del(.metadata.resourceVersion) | del(.metadata.uid) | del(.metadata.creationTimestamp)' - | yq e '.spec |= (select(has("match")) | .matchRegex = "^(?!flowserver.*).*" | del(.match))' > modified-clusteroutput-es.yaml -
Delete the old ClusterOutput resource
kubectl delete clusteroutput elasticsearch -
Apply the modified cluster output resource
kubectl apply -f modified-clusteroutput-es.yaml -
Restart fluent-bit pods
kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit
5. Connect-PostgreSQL Fails to Install
If you are installing connect on a Foundation version lower than 25R01, the connect-postgresql Helm chart will fail with the following error:
Error: UPGRADE FAILED: cannot patch "connect-postgresql" with kind postgresql: postgresql.acid.zalan.do "connect-postgresql" is invalid: spec.postgresql.version: Unsupported value: "17": supported values: "10", "11", "12", "13", "14", "15"
Resolution Options:
-
Upgrade Foundation to 25R01+
Upgrade your environment to Foundation version 25R01 or higher and then run apply-helmfile to redeploy the GridOS connect-postgresql with the latest supported version.
-
Downgrade connect-postgresql chart
Edit helmfile.yaml.gotmpl in the root directory of the Helm Deployment Template.
-
Ensure that the connect-postgresql chart version is set to 0.5.0, and run
apply-helmfileto redeploy the GridOS connect-postgresql.
... - name: connect-postgresql <<: *defaults chart: connect-helm/connect-postgresql version: 0.5.0 values: - ./connect-postgresql/values.yaml # - ./offline-image-overrides.yaml ...
6. connect-identityreconciler Pod Fails to Start Due to Renamed APISIX Secret
Connect release 1.20.0 includes updates to support changes in APISIX secrets introduced in Foundation version 25r09.
Symptom:
The connect-identityreconciler pod may fail to start and log one of the following errors:
-
When deploying Connect from scratch:
`MountVolume.SetUp failed for volume "secrets": secret "connect-identityreconciler-apisix-api-token" not found`
-
When upgrading Connect:
`Getting upstream 'connect-flowserver-service' failed with status: 401 Unauthorized`
Root Cause:
In Foundation version 25r09, an APISIX secret was renamed. Connect relies on this secret, so if Connect and Foundation are on mismatched versions, the secret name does not align. This can cause the pod to fail at startup or result in service requests being rejected.
Compatibility Overview: The following matrix shows the required actions based on the versions of Foundation and Connect:
| Foundation version | Connect version | Action |
|---|---|---|
|
|
No action required; the setup is fully compatible. |
Older than |
|
Apply the override described in Solution. |
|
Older than |
Apply the override described in Solution. |
Solution:
In connect/values.yaml, add the following override under the identityreconciler block:
Foundation version older than |
Foundation version |
|
|