Install GridOS Connect

This is an installation guide for GridOS Connect deployed on Foundation environments. This documentation is intended for customer deployments of GridOS.

1. Deployment

1.1. Prerequisites

Before deploying GridOS Connect, make sure you have set up all required tools, cluster configurations, and access credentials. You must also review resource scaling requirements to ensure the deployment is sized correctly, which helps prevent installation errors and ensures a smooth deployment process.

1.1.2. Foundation

  1. GridOS Connect has been deployed and validated on {foundation-docs-base-url}/index.html[Foundation] version {foundation-docs-version} with non-pdi-k8s-version: {non-pdi-k8s-version}. To ensure compatibility and prevent deployment issues, we recommend using Foundation version {foundation-docs-version}. Deploying GridOS Connect on older Foundation versions may result in unexpected errors. For more details on potential issues, see Troubleshooting.

    Connect is also continuously tested with the latest Foundation changes in a combined test environment. At the time of release there are no known issues running Connect on the upcoming {upcoming-foundation-version} version of Foundation.

    For more information about Connect Foundation version compatibility, see Connect Foundation Version Requirements.

  2. Your kubectl current-context must be set to your Foundation environment.

1.1.3. Artifactory Access

You will need access to GridOS Connect Artifactory repositories hosted at GE Digital Grid Artifactory.

1.1.4. Hardware Sizing Guidelines

To install Connect, request an appropriately sized cluster from Foundation, then proceed with the installation. For guidance on sizing GridOS clusters with Foundation and Connect, see Hardware Sizing Guidelines.

1.2. Configure Foundation

At the time of writing, a base Foundation installation will require additional configuration to work with GridOs Connect.

  1. GridOS Connect relies on services provided by Foundation, including Zitadel for identity management and Apisix for API gateway functionality. Some of these services may need to be explicitly enabled or reconfigured. The following configuration file lists the required settings: Foundation Helm overlay values.

  2. Add these settings to your Foundation Helm values. Depending on your Foundation deployment, you can place them in a file such as local-overrides.yaml.

  3. The {foundation-docs-base-url}/foundation-base-user-guide/the-data-loader.html[Foundation data loader] should specify auth configuration. See Auth Configuration for more details.

  4. Optional. If you want to deploy a {foundation-docs-base-url}/foundation-base-installation-guide/multisite/multi-site-overview.html[multi-site cluster], follow the {foundation-docs-base-url}/foundation-base-installation-guide/multisite/multi-site-deployment.html[Foundation documentation] to enable multi-site support during the Foundation deployment stage.

If you have problems or uncertainties regarding how to apply the Foundation configuration, contact Foundation support.

1.2.1. Enable CPU Throttling for Kubernetes

When running GridOS Connect we require Kubernetes CPU throttling to be enabled in order for our services to have predictable performance and behavior. This can be done as part of the Foundation installation, or be applied to an existing cluster.

Apply to Existing Cluster

SSH to Virtual machine that host kubernetes node

# Edit this file on all nodes
vi /etc/rancher/rke2/config.yaml

# Change cpu-cfs-quota to true, it was set to false!
kubelet-arg:
  - "cpu-cfs-quota=true"

# Restart rke2-server
sudo systemctl restart rke2-server

# Restart rke2-worker node if applicable
sudo systemctl restart rke2-agent
Apply during Foundation Installation using PDI

See example {foundation-docs-base-url}/foundation-base-installation-guide/setup-infrastructure-layer.html#_minimal_manifest[manifest] in the Foundation docs.

Set the property cpu_limits_enforcement to true under the k8s param_group with id k8s_all_group_vars, see example snippet below.

...
- id: k8s_all_group_vars
  params:
    cpu_limits_enforcement: true
...
Apply during Foundation Installation without using PDI

See {foundation-docs-base-url}/foundation-base-installation-guide/setup-infrastructure-layer-k8s.html#_parameters[Foundation installation documentation] for more information. You need to change the value of cpu_limits_enforcement to true.

1.2.2. Configure Local DNS

You will need to configure your Domain Name Service by mapping domains to your externally accessible IP address, referred to as: worker-ip.

  1. Find your “worker-ip” by running the command: kubectl get nodes -o wide and select the node without the control-plane role. In a POSIX[2] shell you can do:

    kubectl get nodes -o wide | grep -v control-plane | tail -n 1 | awk '{ print $6 }'
  2. You will need to configure your DNS for the following domains:

    {worker-ip} admin.YOUR_FOUNDATION_DOMAIN
    {worker-ip} api.YOUR_FOUNDATION_DOMAIN
    {worker-ip} console.YOUR_FOUNDATION_DOMAIN
    {worker-ip} service.YOUR_FOUNDATION_DOMAIN
    {worker-ip} zitadel.YOUR_FOUNDATION_DOMAIN

    If you are using a Connect Agent or mTLS service account, you must add a DNS entry matching the below pattern:

    {worker-ip} {org-id}.mtls.YOUR_FOUNDATION_DOMAIN

    Example:

    10.227.49.xxx gridos.mtls.env-connect-mvp-ingress.local

1.2.3. TLS

Ensure that your Foundation environment has correctly configured TLS. If a private Certificate Authority (CA) is used for this environment, make sure you have configured the chain of trust on your system.

1.2.4. Auth Configuration

Authorization and authentication (auth) in Foundation is managed by an Identity Provider (IDP). This is described in the Foundation documentation sections:

  • {foundation-docs-base-url}/foundation-base-user-guide/the-identity-provider.html[IDP]

  • {foundation-docs-base-url}/foundation-base-user-guide/the-data-loader.html[Data Loader]

  • {foundation-docs-base-url}/foundation-base-user-guide/the-admin-console.html[Admin console]

In order for a user to be granted access to GridOS connect, you will need the following:

  1. A user account in your IDP - this is the user you will use for logging in to the Connect Console or deploy flows

  2. A Role Manager Permission created specifically for the GridOS Connect role you need

  3. A Role Manager Role

  4. A Role Manager Usergroup

  5. An AD/LDAP user group - where your IDP account is a member of this group

  6. A Role Manager Mapping between the Role Manager Usergroup and the AD/LDAP user group

All of these, except one, must be created following the Foundation documentation linked above.

The creation of the permission is done by following the subsequent section.

GridOS Connect Role as a Role Manager Permission

Familiarize yourself with the available GridOS Connect roles:

Role Description

Admin

A user with full integration permissions (read and write).

Agent

A user needed for the GridOS Connect Agent to be able to communicate with the GridOS Connect main cluster.

Monitor

A "read-only" user that can view flow traces and flow details but is not allowed to make edits.

For each Connect role you want to utilize, you will need to create a new permission in the Role Manager.

In order for this Role Manager Permission to take effect, you will need to ensure the following:

  • The Role Manager Permission must be mapped to a Role Manager Role.

  • The Role Manager Role must be mapped to a Role Manager Usergroup.

  • The Role Manager Usergroup must be mapped to a group in your IDP.

  • The IDP managed user account you want to utilize must be a member of this IDP managed group.

The Connect Role Manager Permission is defined with a string value with a specific format:

connect.<ORG_ID>.<GRIDOS_CONNECT_ROLE>
  1. The prefix: connect. is required.

  2. <ORG_ID> is a string value that makes sense to the specific tenant organization. E.g if you want to add permissions to GridOS Connect for Acme Corp, the org id can be acme.

  3. <GRIDOS_CONNECT_ROLE> is a lowercase string value matching one of the above mentioned GridOS Connect roles.

The <ORG_ID> is referenced as owner-id in Deployer section.

1.2.5. Create a UserGroup for the Connect Identity Reconcilers

The Connect Identity Reconciler applications require a UserGroup mapping for connect-identity-reconciler to exist, and it needs to be associated with the following permissions:

  • roleManager.userGroups.read.readAll

  • roleManager.permissions.read.readAll

  • roleManager.roles.read.readAll

You achieve this by doing the following in the security admin console:

  1. Add a Role called Connect Identity Reconciler and map the permissions listed above to it.

  2. Add a Usergroup called Connect Identity Reconciler and set the Mapped GroupName field to connect-identity-reconciler.

  3. Add the Connect Identity Reconciler Role to the Mapped Roles of the new Usergroup.

1.3. Deploy Connect

1.3.1. Prepare for Deployment

The Connect installation includes the main chart, connect, and three dependency charts: connect-postgresql, connect-victoria-metrics, and connect-openbao.

These charts, along with the required value override and auxiliary deployment files, are packaged into a ZIP artifact called the Helm Deployment Template.

  1. Download the Helm Deployment Template to your local machine.

  2. Unpack the ZIP file.

  3. Review the allocated resources. The Helm Deployment Template includes a values.yaml file for each chart. Review and override the default resource allocations as needed for your use case.

    The service resource values are set conservatively. Since the Connect team cannot anticipate customer-specific requirements, these resource allocations must be reviewed carefully. Misconfiguration can lead to broken clusters, poor performance, or excessive compute costs.

    To review and override the chart values, see Chart Value Override Recommendations.

  4. Set the kube-context and namespace for Connect installation.

    # Unix
    KUBE_CONTEXT=<KUBE_CONTEXT>
    CONNECT_NS=<CONNECT_NAMESPACE>
    # Windows - PowerShell
    $KUBE_CONTEXT=<KUBE_CONTEXT>
    $CONNECT_NS=<CONNECT_NAMESPACE>
Ensure that you set (override) the value of the property global.clusterExternalUrl to the externally available service domain in the form of an HTTPS URL (e.g., https://YOUR_FOUNDATION_DOMAIN).

1.3.2. Enable the Connect Deployment Operator

This step is optional. You can choose to deploy the Connect Deployment Operator now or skip it and deploy it later, depending on your requirements.

The Connect Deployment Operator manages Connect deployables as Kubernetes objects.

Connect Deployment Operator Configuration

The operator is an opt-in feature and must be enabled in a Connect Helm value override:

deploymentoperator:
  enabled: true

The operator requires OAuth 2.0 authentication with Connect Admin privileges to manage deployments. It retrieves the required credentials from a Zitadel Service User Secret in the same Kubernetes namespace where Connect is deployed.

By default, the operator expects a Zitadel service user secret named connect-deployment-operator-creds. This service user must be added to the Connect deployment and is not created automatically by the Connect chart.

You can configure the operator to use a different Zitadel service user secret by overriding the secret name. This allows you to reuse an existing Zitadel service user secret that has Connect Admin privileges.

deploymentoperator:
  enabled: true
  secrets:
    existing:
      connect-deployment-operator-creds:
        name: 'your-existing-zitadel-service-user-secret'

1.3.3. Perform the Deployment

Make sure you have completed all prerequisites before starting the deployment. Then continue with one of the following deployment options: Deploy Air-gapped, Deploy with Helm, or Deploy with Helmfile.

Deploy Air-gapped

Download the Connect Cache Container, which contains the Docker images required for installation, to the machine where you unpacked the Helm Deployment Template files.

The Helm Deployment Template unpacked in the Prepare for Deployment step, contains a docs/README.adoc file with detailed instructions about how to perform an air-gapped installation of Connect.

Deploy with Helm

To ensure correct installation, a file is provided in the Helm Deployment Template that specifies the required arguments for the helm upgrade command.

Install the dependency charts (connect-postgresql, connect-victoria-metrics, and connect-openbao) first, followed by the connect chart.
When installing a chart, the release name must match the chart name.

The following steps should be performed when installing each chart:

  1. Navigate to the unpacked ZIP folder.

  2. Set the release name.

    # Unix
    RELEASE_NAME="connect"
    # Windows - PowerShell
    $RELEASE_NAME="connect"
  3. Set the chart archive reference with the value provided in connect-charts.csv.

    # Unix
    CHART_ARCHIVE_PATH=$(grep "$RELEASE_NAME," connect-charts.csv | cut -d, -f3)
    # Windows - PowerShell
    $CHART_ARCHIVE_PATH=$(Get-Content connect-charts.csv | Select-String -Pattern "$RELEASE_NAME," | ForEach-Object { ($_ -split ',')[2].Trim() })
  4. Install the Helm chart.

    helm upgrade -i \
    --kube-context $KUBE_CONTEXT \
    -n $CONNECT_NS \
    $RELEASE_NAME \
    $CHART_ARCHIVE_PATH \
    --values "$RELEASE_NAME/values.yaml" \
    --set "global.clusterExternalUrl=https://YOUR_FOUNDATION_DOMAIN" \
    --wait-for-jobs \
    --wait \
    --timeout=15m
Deploy with Helmfile
  1. Helmfile diffs should be run in a dry-run mode, which can be set with the environment variable: HELM_DIFF_USE_UPGRADE_DRY_RUN=true.

  2. Run helmfile apply:

    Unix:

    CLUSTER_EXTERNAL_DNS=YOUR_FOUNDATION_DOMAIN; helmfile --kube-context $KUBE_CONTEXT -n $CONNECT_NS apply --set "global.clusterExternalUrl=https://$CLUSTER_EXTERNAL_DNS"

    Windows - PowerShell:

    $CLUSTER_EXTERNAL_DNS="YOUR_FOUNDATION_DOMAIN"
    helmfile --kube-context $KUBE_CONTEXT -n $CONNECT_NS apply --set "global.clusterExternalUrl=https://$CLUSTER_EXTERNAL_DNS"

1.4. Restart Fluentbit Pods

  • Restart fluenbit pods

    kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit
  • Restart flowserver pods

    kubectl delete po -n foundation-env-default -l app.kubernetes.io/name=flowserver

For more information, see Flow-traces/Logs workaround.

For more information on how to create a machine user, see Create a machine user with the client credentials grant.

2. Chart Value Override Recommendations

After you extract the Helm Deployment Template, you can view the default chart values for each chart by running the following Helm command:

helm show values $CHART_ARCHIVE_PATH
See Deploy with Helm for details on how to resolve CHART_ARCHIVE_PATH.

Alternatively, you can reference the following for the default chart configurations:

It is recommended that you maintain custom value overrides in separate files stored in version control.

Defining value override files separately makes it easier to apply them while following the standard install or upgrade instructions.

For example, to increase the flow-server memory limit to 6GB for the connect chart:

Connect applications in the connect chart are JVM-based. Memory limits and requests must be set to the same value to ensure predictable performance and avoid JVM heap sizing issues.

The javaMaxRamPercentage setting controls what percentage of the container’s total memory the JVM uses for heap. For example, setting it to 70 means the JVM uses 70% of container memory for heap, while the remaining 30% is reserved for non-heap memory (such as metaspace, thread stacks, and native memory).

  1. Create a separate values file values.resource.yaml.

    # contents of values.resource.yaml
    flowserver:
      resources:
        requests:
          cpu: 1.0
          memory: 6Gi
        limits:
          memory: 6Gi
      javaMaxRamPercentage: 70
  2. Install or upgrade existing installation.

    1. Using Helm:

      helm upgrade -i \
      --kube-context $KUBE_CONTEXT \
      -n $CONNECT_NS \
      $RELEASE_NAME \
      $CHART_ARCHIVE_PATH \
      --values "$RELEASE_NAME/values.yaml" \
      --values values.resource.yaml  #<-- additional overrides
      --wait-for-jobs \
      --wait \
      --timeout=15m
      When applying custom value overrides using a values file (-f/--values) or a single property override (--set), the last (rightmost) argument specified takes precedence.
    2. Using Helmfile:

      Add the custom values file to the release entry in helmfile.yaml.gotmpl.

      ...
      - name: connect
        chart: ./Charts/connect-xxx.tgz
        version: x.x.x
        values:
          - ./connect/values.yaml
          - values.resources.yaml  #<-- additional overrides
      ...

      When specifying value override files in the releases[].values element of a Helmfile, the files are applied in order. The last file specified takes precedence.

      The -f/--values and --set flags can also be passed to the helmfile apply command. They are applied to each release item, which can be useful for setting global values. For non-global value overrides, it is recommended to define them in the helmfile.yaml.gotmpl file.

2.1. Resource Scaling

Assuming you still have the Helm Deployment Template bundle extracted on your local machine:

  1. Review the resource allocation in the values.yaml files provided for each chart.

  2. Define any overrides as described in the Chart Value Override Recommendations section.

  3. Run helmfile apply.

2.2. Connect Service Log Retention

Elastic Search stores Connect service logs. These logs provide Flow traces, an archive of integration executions. Log retention requires resources and for some use cases it may be beneficial to control when logs are purged. This can be configured using the jobs.elasticsearch.index.delete.minAge in the connect/values.yaml file located in the Helm Deployment Template.

For example, to decrease the retention time to 90 days, provide the following values override snippet:

jobs:
  # ...
  elasticsearch:
    index:
      delete:
        minAge: 90d
  # ...

Place the snippet directly in connect/values.yaml (from the Helm Deployment Template) or in a separate file, and apply it as described in Chart value override recommendations.

Make sure you have read the Elastic Search documentation on lifecycle policy updates.

3. Known Limitations

This section identifies known limitations and constraints when installing and operating GridOS Connect on Foundation.

3.1. CPU Throttling Requirement

Kubernetes CPU throttling must be enabled for GridOS Connect to function correctly. Without CPU throttling enabled, Connect services may exhibit unpredictable performance and behavior. See Enable CPU Throttling for Kubernetes for configuration instructions.

3.2. Flow Server Upgrade Limitation

When upgrading from Connect version 1.18.0 or earlier to Connect version 1.19.0 or later, the flow-server deployment must be restarted. Rolling upgrades are not supported for this version transition.

Workaround: After upgrading Connect, restart the flow-server deployment:

  1. Scale down the flow-server deployment to zero replicas:

    kubectl -n foundation-env-default scale deploy connect-flowserver --replicas 0
  2. Wait for all flow-server pods to terminate.

  3. Scale up the flow-server deployment:

    kubectl -n foundation-env-default scale deploy connect-flowserver --replicas 3

3.3. Fluentbit Log Collection Issue

After deploying Connect, Fluentbit may not automatically collect flowserver logs due to a known issue with the Fluentbit Operator. This manifests as missing flow traces in the Connect Console and an index_not_found_exception error for the flowserver-logs index.

Workaround: Restart both Fluentbit and flowserver pods after installation. See Flow-traces/Logs workaround for detailed instructions.

3.4. Log Duplication in Elasticsearch

The default Foundation Fluent-bit configuration causes Connect flowserver logs to be duplicated across Elasticsearch indices. Each log event is stored three times (twice in the log index and once in the flowserver-log index), increasing storage usage.

See Prevent Log Duplication of Connect Flow Server for configuration steps to eliminate redundant log storage.

3.5. Service Account Creation and Flow Access Granting Errors

The connect-identity and connect-resourceregistry services use OpenBao for secrets management. With Connect versions prior to 1.24.0, users may encounter errors when creating service accounts or granting flow access. This occurs because the connect-identity and connect-resourceregistry services are configured to use the connect-openbao:8200 Kubernetes service, which load balances requests across all OpenBao pods, including follower nodes. Follower nodes reject write operations.

Workaround: Add the following override in your connect/values.yaml to route requests to the OpenBao leader pod:

...
identity:
  config:
    application.yml:
      openbao:
        address: http://connect-openbao-active:8200
...
resourceregistry:
  config:
    application.yml:
      openbao:
        address: http://connect-openbao-active:8200
...

4. Next Steps

You have completed the installation of GridOS Connect on Foundation. If you want to deploy a new integration flow on Connect, review the following options:

5. Additional Resources


1. Required only when installing using Helmfile.
2. macOS or Linux