Troubleshooting

This section helps you identify, diagnose, and resolve common issues in GridOS Connect.

1. Access

  • If you have authentication problems when accessing services, or if your session has timed out, sign in to Zitadel using the IDP configured credentials and then force refresh your service web page.

  • If you have trouble accessing any services, make sure you delete all cookies and local storage/cache. Alternatively, use the incognito mode of your web browser.

2. Load Balancers and Reverse Proxies

You do not require any additional configuration for loadbalancers for Connect. As long as the loadbalancer points to the VMs port 443 and preserves the host in http requests (for application loadbalancers), and all required DNS names are added to point to the loadbalancer (zitadel, service, console, api), the load balancer will work.

3. Flow-traces/Logs

This issue will produce the following error message in the GridOS Connect Console Overview:

co.elastic.clients.elasticsearch._types.ElasticsearchException: [es/search] failed: [index_not_found_exception] no such index [flowserver-logs]

This is a known issue with Fluentbit Operator - after adding new custom resources (ClusterInput, ClusterFilter and ClusterOuput), Fluentbit is not able to fetch flowserver logs.

Assuming flowserver is running as expected and flows have been deployed, but flow traces are not visible in the "Flow traces" tab in GridOS Connect console, then it is safe to assume that fluentbit is not able to fetch logs for flowserver.

Workaround:

  • Restart fluentbit pods

    kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit
  • Restart flowserver pods

    kubectl delete po -n foundation-env-default -l app.kubernetes.io/name=flowserver

4. Prevent Log Duplication of Connect Flow Server

The default Foundation Fluent-bit ClusterOutput configuration leads to log duplication across Elasticsearch indices. Each log event is stored multiple times, which increases storage usage and might impact query performance.

Specifically:

  • Each log event is indexed as a document in Elasticsearch.

  • Two copies is stored in the log index.

  • One copy is stored in the flowserver-log index.

As a result, each log event is stored three times in total.

Solution: Since Connect flowserver logs only needs to be retained in the flowserver-log index, they can be removed from the log index to reduce redundancy and optimize storage.

Update fluent-bit configuration Assuming the clusteroutput.fluentbit.fluent.io/elasticsearch resource has not been modified outside from default Foundation configuration.

  1. Fetch elasticsearch ClusterOutput configuration

    kubectl get clusteroutputs elasticsearch -o yaml
  2. Modify elasticsearch ClusterOutput the configuration

    kubectl get clusteroutput elasticsearch -o yaml | yq e 'del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration") | del(.metadata.generation) | del(.metadata.resourceVersion) | del(.metadata.uid) | del(.metadata.creationTimestamp)' - | yq e '.spec |= (select(has("match")) | .matchRegex = "^(?!flowserver.*).*" | del(.match))'
  3. Compare the outputs Compare and verify that that output from step 1. and 2. matches the table below. The match field should be replaced with matchRegex.

    Original Modified
    apiVersion: fluentbit.fluent.io/v1alpha2
    kind: ClusterOutput
    metadata:
      annotations:
        meta.helm.sh/release-name: monitoring-apps
        meta.helm.sh/release-namespace: foundation-cluster-monitoring
      labels:
        app.kubernetes.io/managed-by: Helm
        fluentbit.fluent.io/enabled: "true"
      name: elasticsearch
    spec:
      es:
        generateID: true
        host: monitoring-apps-es-client
        index: logs
        logstashFormat: false
        port: 9200
        replaceDots: true
        suppressTypeName: "On"
        timeKey: '@timestamp'
        traceError: false
        traceOutput: false
        type: _doc
      match: '*'
      retry_limit: "False"
    apiVersion: fluentbit.fluent.io/v1alpha2
    kind: ClusterOutput
    metadata:
      annotations:
        meta.helm.sh/release-name: monitoring-apps
        meta.helm.sh/release-namespace: foundation-cluster-monitoring
      labels:
        app.kubernetes.io/managed-by: Helm
        fluentbit.fluent.io/enabled: "true"
      name: elasticsearch
    spec:
      es:
        generateID: true
        host: monitoring-apps-es-client
        index: logs
        logstashFormat: false
        port: 9200
        replaceDots: true
        suppressTypeName: "On"
        timeKey: '@timestamp'
        traceError: false
        traceOutput: false
        type: _doc
      matchRegex: ^(?!flowserver.*).*
      retry_limit: "False"

    If your output is consistent with the table above. Use the following commands below to exclude flowserver logs in the "elasticsearch" ClusterOutput.

  4. Create a YAML file for the modified ClusterOutput resource

    kubectl get clusteroutput elasticsearch -o yaml | yq e 'del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration") | del(.metadata.generation) | del(.metadata.resourceVersion) | del(.metadata.uid) | del(.metadata.creationTimestamp)' - | yq e '.spec |= (select(has("match")) | .matchRegex = "^(?!flowserver.*).*" | del(.match))' > modified-clusteroutput-es.yaml
  5. Delete the old ClusterOutput resource

    kubectl delete clusteroutput elasticsearch
  6. Apply the modified cluster output resource

    kubectl apply -f modified-clusteroutput-es.yaml
  7. Restart fluent-bit pods

    kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit

5. Connect-PostgreSQL Fails to Install

If you are installing connect on a Foundation version lower than 25R01, the connect-postgresql Helm chart will fail with the following error:

Error: UPGRADE FAILED: cannot patch "connect-postgresql" with kind postgresql: postgresql.acid.zalan.do "connect-postgresql" is invalid: spec.postgresql.version: Unsupported value: "17": supported values: "10", "11", "12", "13", "14", "15"

Resolution Options:

  • Upgrade Foundation to 25R01+

Upgrade your environment to Foundation version 25R01 or higher and then run apply-helmfile to redeploy the GridOS connect-postgresql with the latest supported version.

  • Downgrade connect-postgresql chart

Edit helmfile.yaml.gotmpl in the root directory of the Helm Deployment Template.

  • Ensure that the connect-postgresql chart version is set to 0.5.0, and run apply-helmfile to redeploy the GridOS connect-postgresql.

...
- name: connect-postgresql
  <<: *defaults
  chart: connect-helm/connect-postgresql
  version: 0.5.0
  values:
  - ./connect-postgresql/values.yaml
  # - ./offline-image-overrides.yaml
...

6. connect-identityreconciler Pod Fails to Start Due to Renamed APISIX Secret

Connect release 1.20.0 includes updates to support changes in APISIX secrets introduced in Foundation version 25r09.

Symptom: The connect-identityreconciler pod may fail to start and log one of the following errors:

  • When deploying Connect from scratch:

    `MountVolume.SetUp failed for volume "secrets": secret "connect-identityreconciler-apisix-api-token" not found`
  • When upgrading Connect:

    `Getting upstream 'connect-flowserver-service' failed with status: 401 Unauthorized`

Root Cause: In Foundation version 25r09, an APISIX secret was renamed. Connect relies on this secret, so if Connect and Foundation are on mismatched versions, the secret name does not align. This can cause the pod to fail at startup or result in service requests being rejected.

Compatibility Overview: The following matrix shows the required actions based on the versions of Foundation and Connect:

Foundation version Connect version Action

25r09 or later

1.20.0 or later

No action required; the setup is fully compatible.

Older than 25r09

1.20.0 or later

Apply the override described in Solution.

25r09 or later

Older than 1.20.0

Apply the override described in Solution.

Solution:

In connect/values.yaml, add the following override under the identityreconciler block:

Foundation version older than 25r09 and Connect 1.20.0 or later

Foundation version 25r09 or later and Connect version older than 1.20.0

# Override for Foundation older than 25r09 and Connect 1.20.0+
secretManager:
  secrets:
    apisix-api-token:
      name: apisix-control-plane-api-token
# Override for Foundation 25r09+ and Connect version older than 1.20.0
secretManager:
  secrets:
    apisix-api-token:
      name: apisix-control-plane-token