Install GridOS Connect
This is an installation guide for GridOS Connect deployed on Foundation environments. This documentation is intended for customer deployments of GridOS.
1. Deployment
1.1. Prerequisites
Before deploying GridOS Connect, make sure you have set up all required tools, cluster configurations, and access credentials. You must also review resource scaling requirements to ensure the deployment is sized correctly, which helps prevent installation errors and ensures a smooth deployment process.
1.1.2. Foundation
-
GridOS Connect has been deployed and validated on {foundation-docs-base-url}/index.html[Foundation] version
{foundation-docs-version}with non-pdi-k8s-version:{non-pdi-k8s-version}. To ensure compatibility and prevent deployment issues, we recommend using Foundation version{foundation-docs-version}. Deploying GridOS Connect on older Foundation versions may result in unexpected errors. For more details on potential issues, see Troubleshooting.Connect is also continuously tested with the latest Foundation changes in a combined test environment. At the time of release there are no known issues running Connect on the upcoming {upcoming-foundation-version} version of Foundation.
For more information about Connect Foundation version compatibility, see Connect Foundation Version Requirements.
-
Your
kubectl current-contextmust be set to your Foundation environment.
1.1.3. Artifactory Access
You will need access to GridOS Connect Artifactory repositories hosted at GE Digital Grid Artifactory.
1.1.4. Hardware Sizing Guidelines
To install Connect, request an appropriately sized cluster from Foundation, then proceed with the installation. For guidance on sizing GridOS clusters with Foundation and Connect, see Hardware Sizing Guidelines.
1.2. Configure Foundation
At the time of writing, a base Foundation installation will require additional configuration to work with GridOs Connect.
-
GridOS Connect relies on services provided by Foundation, including Zitadel for identity management and Apisix for API gateway functionality. Some of these services may need to be explicitly enabled or reconfigured. The following configuration file lists the required settings: Foundation Helm overlay values.
-
Add these settings to your Foundation Helm values. Depending on your Foundation deployment, you can place them in a file such as
local-overrides.yaml. -
The {foundation-docs-base-url}/foundation-base-user-guide/the-data-loader.html[Foundation data loader] should specify auth configuration. See Auth Configuration for more details.
-
Optional. If you want to deploy a {foundation-docs-base-url}/foundation-base-installation-guide/multisite/multi-site-overview.html[multi-site cluster], follow the {foundation-docs-base-url}/foundation-base-installation-guide/multisite/multi-site-deployment.html[Foundation documentation] to enable multi-site support during the Foundation deployment stage.
| If you have problems or uncertainties regarding how to apply the Foundation configuration, contact Foundation support. |
1.2.1. Enable CPU Throttling for Kubernetes
When running GridOS Connect we require Kubernetes CPU throttling to be enabled in order for our services to have predictable performance and behavior. This can be done as part of the Foundation installation, or be applied to an existing cluster.
Apply to Existing Cluster
SSH to Virtual machine that host kubernetes node
# Edit this file on all nodes
vi /etc/rancher/rke2/config.yaml
# Change cpu-cfs-quota to true, it was set to false!
kubelet-arg:
- "cpu-cfs-quota=true"
# Restart rke2-server
sudo systemctl restart rke2-server
# Restart rke2-worker node if applicable
sudo systemctl restart rke2-agent
Apply during Foundation Installation using PDI
See example {foundation-docs-base-url}/foundation-base-installation-guide/setup-infrastructure-layer.html#_minimal_manifest[manifest] in the Foundation docs.
Set the property cpu_limits_enforcement to true under the k8s param_group with id k8s_all_group_vars, see example snippet below.
...
- id: k8s_all_group_vars
params:
cpu_limits_enforcement: true
...
Apply during Foundation Installation without using PDI
See {foundation-docs-base-url}/foundation-base-installation-guide/setup-infrastructure-layer-k8s.html#_parameters[Foundation installation documentation] for more information. You need to change the value of cpu_limits_enforcement to true.
1.2.2. Configure Local DNS
You will need to configure your Domain Name Service by mapping domains to your externally accessible IP address, referred to as: worker-ip.
-
Find your “worker-ip” by running the command:
kubectl get nodes -o wideand select the node without thecontrol-planerole. In a POSIX[2] shell you can do:kubectl get nodes -o wide | grep -v control-plane | tail -n 1 | awk '{ print $6 }' -
You will need to configure your DNS for the following domains:
{worker-ip} admin.YOUR_FOUNDATION_DOMAIN {worker-ip} api.YOUR_FOUNDATION_DOMAIN {worker-ip} console.YOUR_FOUNDATION_DOMAIN {worker-ip} service.YOUR_FOUNDATION_DOMAIN {worker-ip} zitadel.YOUR_FOUNDATION_DOMAINIf you are using a Connect Agent or mTLS service account, you must add a DNS entry matching the below pattern:
{worker-ip} {org-id}.mtls.YOUR_FOUNDATION_DOMAINExample:
10.227.49.xxx gridos.mtls.env-connect-mvp-ingress.local
1.2.3. TLS
Ensure that your Foundation environment has correctly configured TLS. If a private Certificate Authority (CA) is used for this environment, make sure you have configured the chain of trust on your system.
1.2.4. Auth Configuration
Authorization and authentication (auth) in Foundation is managed by an Identity Provider (IDP). This is described in the Foundation documentation sections:
-
{foundation-docs-base-url}/foundation-base-user-guide/the-identity-provider.html[IDP]
-
{foundation-docs-base-url}/foundation-base-user-guide/the-data-loader.html[Data Loader]
-
{foundation-docs-base-url}/foundation-base-user-guide/the-admin-console.html[Admin console]
In order for a user to be granted access to GridOS connect, you will need the following:
-
A user account in your IDP - this is the user you will use for logging in to the Connect Console or deploy flows
-
A Role Manager Permission created specifically for the GridOS Connect role you need
-
A Role Manager Role
-
A Role Manager Usergroup
-
An AD/LDAP user group - where your IDP account is a member of this group
-
A Role Manager Mapping between the Role Manager Usergroup and the AD/LDAP user group
All of these, except one, must be created following the Foundation documentation linked above.
The creation of the permission is done by following the subsequent section.
GridOS Connect Role as a Role Manager Permission
Familiarize yourself with the available GridOS Connect roles:
| Role | Description |
|---|---|
Admin |
A user with full integration permissions (read and write). |
Agent |
A user needed for the GridOS Connect Agent to be able to communicate with the GridOS Connect main cluster. |
Monitor |
A "read-only" user that can view flow traces and flow details but is not allowed to make edits. |
For each Connect role you want to utilize, you will need to create a new permission in the Role Manager.
|
In order for this Role Manager Permission to take effect, you will need to ensure the following:
|
The Connect Role Manager Permission is defined with a string value with a specific format:
connect.<ORG_ID>.<GRIDOS_CONNECT_ROLE>
-
The prefix:
connect.is required. -
<ORG_ID>is a string value that makes sense to the specific tenant organization. E.g if you want to add permissions to GridOS Connect for Acme Corp, the org id can beacme. -
<GRIDOS_CONNECT_ROLE>is a lowercase string value matching one of the above mentioned GridOS Connect roles.
The <ORG_ID> is referenced as owner-id in Deployer section.
|
1.2.5. Create a UserGroup for the Connect Identity Reconcilers
The Connect Identity Reconciler applications require a UserGroup mapping for connect-identity-reconciler to exist, and it needs to be associated with the following permissions:
-
roleManager.userGroups.read.readAll -
roleManager.permissions.read.readAll -
roleManager.roles.read.readAll
You achieve this by doing the following in the security admin console:
-
Add a Role called
Connect Identity Reconcilerand map the permissions listed above to it. -
Add a Usergroup called
Connect Identity Reconcilerand set the Mapped GroupName field toconnect-identity-reconciler. -
Add the
Connect Identity ReconcilerRole to the Mapped Roles of the new Usergroup.
1.3. Deploy Connect
1.3.1. Prepare for Deployment
The Connect installation includes the main chart, connect, and three dependency charts: connect-postgresql, connect-victoria-metrics, and connect-openbao.
These charts, along with the required value override and auxiliary deployment files, are packaged into a ZIP artifact called the Helm Deployment Template.
-
Download the Helm Deployment Template to your local machine.
-
Unpack the ZIP file.
-
Review the allocated resources. The Helm Deployment Template includes a
values.yamlfile for each chart. Review and override the default resource allocations as needed for your use case.The service resource values are set conservatively. Since the Connect team cannot anticipate customer-specific requirements, these resource allocations must be reviewed carefully. Misconfiguration can lead to broken clusters, poor performance, or excessive compute costs.
To review and override the chart values, see Chart Value Override Recommendations.
-
Set the
kube-contextandnamespacefor Connect installation.# Unix KUBE_CONTEXT=<KUBE_CONTEXT> CONNECT_NS=<CONNECT_NAMESPACE># Windows - PowerShell $KUBE_CONTEXT=<KUBE_CONTEXT> $CONNECT_NS=<CONNECT_NAMESPACE>
Ensure that you set (override) the value of the property global.clusterExternalUrl to the externally available service domain in the form of an HTTPS URL (e.g., https://YOUR_FOUNDATION_DOMAIN).
|
1.3.2. Enable the Connect Deployment Operator
|
This step is optional. You can choose to deploy the Connect Deployment Operator now or skip it and deploy it later, depending on your requirements. |
The Connect Deployment Operator manages Connect deployables as Kubernetes objects.
Connect Deployment Operator Configuration
The operator is an opt-in feature and must be enabled in a Connect Helm value override:
deploymentoperator:
enabled: true
The operator requires OAuth 2.0 authentication with Connect Admin privileges to manage deployments. It retrieves the required credentials from a Zitadel Service User Secret in the same Kubernetes namespace where Connect is deployed.
By default, the operator expects a Zitadel service user secret named connect-deployment-operator-creds. This service user must be added to the Connect deployment and is not created automatically by the Connect chart.
You can configure the operator to use a different Zitadel service user secret by overriding the secret name. This allows you to reuse an existing Zitadel service user secret that has Connect Admin privileges.
deploymentoperator:
enabled: true
secrets:
existing:
connect-deployment-operator-creds:
name: 'your-existing-zitadel-service-user-secret'
1.3.3. Perform the Deployment
Make sure you have completed all prerequisites before starting the deployment. Then continue with one of the following deployment options: Deploy Air-gapped, Deploy with Helm, or Deploy with Helmfile.
Deploy Air-gapped
Download the Connect Cache Container, which contains the Docker images required for installation, to the machine where you unpacked the Helm Deployment Template files.
-
For Connect versions earlier than 1.22.0, download the Connect Cache Container (
.tgzfile). -
For Connect versions 1.22.0 and later, download the Connect Cache Container (
.tar.gzfile).
The Helm Deployment Template unpacked in the Prepare for Deployment step, contains a docs/README.adoc file with detailed instructions about how to perform an air-gapped installation of Connect.
Deploy with Helm
To ensure correct installation, a file is provided in the Helm Deployment Template that specifies the required arguments for the helm upgrade command.
Install the dependency charts (connect-postgresql, connect-victoria-metrics, and connect-openbao) first, followed by the connect chart.
|
| When installing a chart, the release name must match the chart name. |
The following steps should be performed when installing each chart:
-
Navigate to the unpacked ZIP folder.
-
Set the
release name.# Unix RELEASE_NAME="connect"# Windows - PowerShell $RELEASE_NAME="connect" -
Set the
chart archive referencewith the value provided inconnect-charts.csv.# Unix CHART_ARCHIVE_PATH=$(grep "$RELEASE_NAME," connect-charts.csv | cut -d, -f3)# Windows - PowerShell $CHART_ARCHIVE_PATH=$(Get-Content connect-charts.csv | Select-String -Pattern "$RELEASE_NAME," | ForEach-Object { ($_ -split ',')[2].Trim() }) -
Install the Helm chart.
helm upgrade -i \ --kube-context $KUBE_CONTEXT \ -n $CONNECT_NS \ $RELEASE_NAME \ $CHART_ARCHIVE_PATH \ --values "$RELEASE_NAME/values.yaml" \ --set "global.clusterExternalUrl=https://YOUR_FOUNDATION_DOMAIN" \ --wait-for-jobs \ --wait \ --timeout=15m
Deploy with Helmfile
-
Helmfile diffs should be run in a dry-run mode, which can be set with the environment variable:
HELM_DIFF_USE_UPGRADE_DRY_RUN=true. -
Run
helmfile apply:Unix:
CLUSTER_EXTERNAL_DNS=YOUR_FOUNDATION_DOMAIN; helmfile --kube-context $KUBE_CONTEXT -n $CONNECT_NS apply --set "global.clusterExternalUrl=https://$CLUSTER_EXTERNAL_DNS"Windows - PowerShell:
$CLUSTER_EXTERNAL_DNS="YOUR_FOUNDATION_DOMAIN" helmfile --kube-context $KUBE_CONTEXT -n $CONNECT_NS apply --set "global.clusterExternalUrl=https://$CLUSTER_EXTERNAL_DNS"
1.4. Restart Fluentbit Pods
-
Restart fluenbit pods
kubectl delete po -n foundation-cluster-monitoring -l app.kubernetes.io/name=fluent-bit -
Restart flowserver pods
kubectl delete po -n foundation-env-default -l app.kubernetes.io/name=flowserver
For more information, see Flow-traces/Logs workaround.
|
For more information on how to create a machine user, see Create a machine user with the client credentials grant. |
2. Chart Value Override Recommendations
After you extract the Helm Deployment Template, you can view the default chart values for each chart by running the following Helm command:
helm show values $CHART_ARCHIVE_PATH
See Deploy with Helm for details on how to resolve CHART_ARCHIVE_PATH.
|
Alternatively, you can reference the following for the default chart configurations:
It is recommended that you maintain custom value overrides in separate files stored in version control.
Defining value override files separately makes it easier to apply them while following the standard install or upgrade instructions.
For example, to increase the flow-server memory limit to 6GB for the connect chart:
|
Connect applications in the The |
-
Create a separate values file
values.resource.yaml.# contents of values.resource.yaml flowserver: resources: requests: cpu: 1.0 memory: 6Gi limits: memory: 6Gi javaMaxRamPercentage: 70 -
Install or upgrade existing installation.
-
Using Helm:
helm upgrade -i \ --kube-context $KUBE_CONTEXT \ -n $CONNECT_NS \ $RELEASE_NAME \ $CHART_ARCHIVE_PATH \ --values "$RELEASE_NAME/values.yaml" \ --values values.resource.yaml #<-- additional overrides --wait-for-jobs \ --wait \ --timeout=15mWhen applying custom value overrides using a values file ( -f/--values) or a single property override (--set), the last (rightmost) argument specified takes precedence. -
Using Helmfile:
Add the custom values file to the release entry in
helmfile.yaml.gotmpl.... - name: connect chart: ./Charts/connect-xxx.tgz version: x.x.x values: - ./connect/values.yaml - values.resources.yaml #<-- additional overrides ...When specifying value override files in the
releases[].valueselement of a Helmfile, the files are applied in order. The last file specified takes precedence.The
-f/--valuesand--setflags can also be passed to thehelmfile applycommand. They are applied to each release item, which can be useful for setting global values. For non-global value overrides, it is recommended to define them in thehelmfile.yaml.gotmplfile.
-
2.1. Resource Scaling
Assuming you still have the Helm Deployment Template bundle extracted on your local machine:
-
Review the resource allocation in the
values.yamlfiles provided for each chart. -
Define any overrides as described in the Chart Value Override Recommendations section.
-
Run helmfile apply.
2.2. Connect Service Log Retention
Elastic Search stores Connect service logs. These logs provide
Flow traces, an archive of integration executions. Log retention requires resources and for some use cases it may be beneficial to control when logs are purged. This can be configured using the
jobs.elasticsearch.index.delete.minAge in the connect/values.yaml file located in the Helm Deployment Template.
For example, to decrease the retention time to 90 days, provide the following values override snippet:
jobs:
# ...
elasticsearch:
index:
delete:
minAge: 90d
# ...
Place the snippet directly in connect/values.yaml (from the Helm Deployment Template) or in a separate file, and apply it as described in Chart value override recommendations.
Run helmfile apply.
| Make sure you have read the Elastic Search documentation on lifecycle policy updates. |
3. Known Limitations
This section identifies known limitations and constraints when installing and operating GridOS Connect on Foundation.
3.1. CPU Throttling Requirement
Kubernetes CPU throttling must be enabled for GridOS Connect to function correctly. Without CPU throttling enabled, Connect services may exhibit unpredictable performance and behavior. See Enable CPU Throttling for Kubernetes for configuration instructions.
3.2. Flow Server Upgrade Limitation
When upgrading from Connect version 1.18.0 or earlier to Connect version 1.19.0 or later, the flow-server deployment must be restarted. Rolling upgrades are not supported for this version transition.
Workaround: After upgrading Connect, restart the flow-server deployment:
-
Scale down the flow-server deployment to zero replicas:
kubectl -n foundation-env-default scale deploy connect-flowserver --replicas 0 -
Wait for all flow-server pods to terminate.
-
Scale up the flow-server deployment:
kubectl -n foundation-env-default scale deploy connect-flowserver --replicas 3
3.3. Fluentbit Log Collection Issue
After deploying Connect, Fluentbit may not automatically collect flowserver logs due to a known issue with the Fluentbit Operator. This manifests as missing flow traces in the Connect Console and an index_not_found_exception error for the flowserver-logs index.
Workaround: Restart both Fluentbit and flowserver pods after installation. See Flow-traces/Logs workaround for detailed instructions.
3.4. Log Duplication in Elasticsearch
The default Foundation Fluent-bit configuration causes Connect flowserver logs to be duplicated across Elasticsearch indices. Each log event is stored three times (twice in the log index and once in the flowserver-log index), increasing storage usage.
See Prevent Log Duplication of Connect Flow Server for configuration steps to eliminate redundant log storage.
3.5. Service Account Creation and Flow Access Granting Errors
The connect-identity and connect-resourceregistry services use OpenBao for secrets management. With Connect versions prior to 1.24.0, users may encounter errors when creating service accounts or granting flow access. This occurs because the connect-identity and connect-resourceregistry services are configured to use the connect-openbao:8200 Kubernetes service, which load balances requests across all OpenBao pods, including follower nodes. Follower nodes reject write operations.
Workaround: Add the following override in your connect/values.yaml to route requests to the OpenBao leader pod:
...
identity:
config:
application.yml:
openbao:
address: http://connect-openbao-active:8200
...
resourceregistry:
config:
application.yml:
openbao:
address: http://connect-openbao-active:8200
...
4. Next Steps
You have completed the installation of GridOS Connect on Foundation. If you want to deploy a new integration flow on Connect, review the following options:
-
To deploy GridOS Connect flows, see the SDK Deployer guide to configure your deployment settings. The
management-api-rootendpoint ishttps://api.YOUR_FOUNDATION_DOMAIN. -
To deploy GridOS Connect service accounts, see Deploy Connect Service Accounts section.
-
To upgrade GridOS Connect, see Upgrade GridOS Connect.
-
To uninstall GridOS Connect, see Uninstall Connect.
5. Additional Resources
-
To manage access to platform services that support GridOS Connect , see Platform Services.
-
To troubleshoot GridOS Connect issues, see Troubleshoot GridOS Connect.
-
For information on Flow Server configuration properties, see Flow server Configuration.