Installing the global Cluster
This document describes how to install the global cluster onto Immutable Infrastructure. The global cluster is the platform control plane and is provisioned through Cluster API. Use this path when the platform control plane must run on an immutable operating system such as Alauda OS.
When to Use This Path
Choose this installation path when all of the following conditions apply:
- You want the
global cluster to run on an immutable operating system. Alauda OS is the supported image today.
- Your infrastructure is one of the documented providers: Huawei DCS, VMware vSphere, Huawei Cloud Stack, or Bare Metal.
- You can run a temporary KIND host that has network access to the target IaaS platform.
For traditional operating systems such as Ubuntu or RHEL, use the standard installation path instead.
Common Prerequisites
The following prerequisites apply to every provider:
- A KIND host that meets the minimum hardware and network requirements. See the Overview for sizing guidance.
- The Core Package from the Customer Portal.
- The Alauda Container Platform Kubeadm Provider package.
- The infrastructure provider package for your target platform.
- Network reachability between the KIND host and the target IaaS platform API endpoint.
- IP and hostname planning for the
global control plane and worker nodes. See Infrastructure Resources for the resource model used by each provider.
- A stable Kubernetes API endpoint for the
global cluster, such as a VIP or load balancer address.
- A platform access address, registry address, and Pod and Service CIDR ranges.
- For x86_64 nodes that use ACP-provided Alauda OS images, the underlying CPUs must support the
x86-64-v2 ISA baseline. See OS Support Matrix.
Naming Convention (Required)
This rule applies to every infrastructure provider supported by this install path — Huawei DCS, Huawei Cloud Stack, VMware vSphere, and any provider added in the future. Every manifest you author in Step 4 must follow it. Misnaming these resources has two distinct failure modes, both detailed below; one breaks initial provisioning, the other only surfaces during disaster recovery.
- The CAPI
Cluster and the provider's infrastructure cluster resource (for example, DCSCluster for Huawei DCS or HCSCluster for Huawei Cloud Stack; each provider has its own equivalent) must be named exactly global. cpaas-installer looks them up by literal name, and the Huawei Cloud Stack provider only allocates the global ELB listener ports (11443 for the registry and console, 2379 for DR etcd-sync, 443 for web access) when the infra cluster is named global. A different name silently breaks registry pull, DR etcd-sync, and the web console.
- Every other CAPI resource (
KubeadmControlPlane, KubeadmConfigTemplate, MachineDeployment) and every other provider infrastructure resource (machine templates, IP/hostname pools, machine config pools, and any other per-provider resource) must use a name with the global- prefix. The DR (failover) mechanism uses this prefix to identify resources owned by the global cluster. A global cluster resource without the global- prefix is invisible to DR and causes the standby cluster's machines to be deleted at failover time — the cluster will provision and run normally, then lose nodes the first time DR is exercised. This is a hard requirement, not a stylistic convention.
Cluster.spec.controlPlaneRef.name and any other cross-references must match the prefixed names exactly.
Before installation, record the supported version set for the delivery package:
Procedure
Step 1 — Prepare Common Variables
Set the common variables on the KIND host.
export HOST_IP="<kind-host-ip>"
export LOCAL_REGISTRY_ADDRESS="127.0.0.1:11443"
export BOOTSTRAP_REGISTRY_ADDRESS="172.18.0.1:11443"
export NODE_REGISTRY_ADDRESS="${HOST_IP}:11443"
export CONTROL_PLANE_VIP="<global-control-plane-vip>"
export PLATFORM_HOST="<platform-access-domain-or-vip>"
export REGISTRY_DOMAIN="<platform-registry-domain-or-vip>:11443"
export CLUSTER_CIDR="100.3.0.0/16"
export SERVICE_CIDR="100.4.0.0/16"
export KUBE_OVN_JOIN_CIDR="<kube-ovn-join-cidr>"
export K8S_VERSION="<target-kubernetes-version>"
export INGRESS_CLASS_NAME="global-alb2"
export HCS_SECRET_NAME="global-secret"
# Use v-prefixed semver that matches the target Alauda OS image.
Use LOCAL_REGISTRY_ADDRESS when pushing packages from the KIND host. Use BOOTSTRAP_REGISTRY_ADDRESS in AppRelease chart repository values because provider Pods read the chart repository from inside the bootstrap KIND network. Use NODE_REGISTRY_ADDRESS in Cluster API registry annotations because provisioned global nodes must pull images through an address reachable from their subnet.
Step 2 — Bootstrap the KIND Host
Run the bootstrap script provided by the Core Package. This brings up a temporary management cluster, minialauda, on the KIND host.
mkdir -p /root/cpaas-install
tar -xvf <core-package> -C /root/cpaas-install
cd /root/cpaas-install/installer
sh setup.sh
mkdir -p ~/.kube
cp /var/cpaas/data/alauda.kubeconfig ~/.kube/config
The bootstrap script provisions an embedded registry, the Cluster API control plane, and the installer components that drive the global cluster installation.
Step 3 — Upload and Install Provider Packages
Upload the Kubeadm provider package and the infrastructure provider package to the local registry.
Why cluster.type is Baremetal for every provider
The AppRelease values in the tabs below all set global.cluster.type: Baremetal. This is a chart-internal classifier, not the IaaS provider name. Keep Baremetal for the Huawei DCS, VMware vSphere, Huawei Cloud Stack, and Bare Metal global installations. The value drives how the platform configures node-level components; it does not select the infrastructure provider.
Set the provider package paths and chart versions.
export DCS_PROVIDER_PACK="/root/cluster-api-provider-dcs.amd64.<version>.tgz"
export KUBEADM_PROVIDER_PACK="/root/cluster-api-provider-kubeadm.amd64.<version>.tgz"
export DCS_PROVIDER_VERSION="<dcs-provider-chart-version>"
export KUBEADM_PROVIDER_VERSION="<kubeadm-provider-chart-version>"
Upload the packages.
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${DCS_PROVIDER_PACK}"
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${KUBEADM_PROVIDER_PACK}"
Create and apply the AppRelease resources for the Kubeadm provider and the DCS provider.
mkdir -p /root/yamls
export DCS_PROVIDER_APPRELEASES="/root/yamls/dcs-provider-appreleases.yaml"
cat > "${DCS_PROVIDER_APPRELEASES}" <<EOF
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-kubeadm
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-kubeadm
releaseName: cluster-api-provider-kubeadm
targetRevision: ${KUBEADM_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-dcs
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-dcs
releaseName: cluster-api-provider-dcs
targetRevision: ${DCS_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
EOF
kubectl apply -f "${DCS_PROVIDER_APPRELEASES}"
until kubectl get crd kubeadmcontrolplanes.controlplane.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q kubeadmcontrolplanes.controlplane.cluster.x-k8s.io; do
sleep 10
done
until kubectl get crd dcsclusters.infrastructure.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q dcsclusters.infrastructure.cluster.x-k8s.io; do
sleep 10
done
Set the provider package paths and chart versions.
export VSPHERE_PROVIDER_PACK="/root/cluster-api-provider-vsphere.amd64.<version>.tgz"
export KUBEADM_PROVIDER_PACK="/root/cluster-api-provider-kubeadm.amd64.<version>.tgz"
export VSPHERE_PROVIDER_VERSION="<vsphere-provider-chart-version>"
export KUBEADM_PROVIDER_VERSION="<kubeadm-provider-chart-version>"
Upload the packages.
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${VSPHERE_PROVIDER_PACK}"
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${KUBEADM_PROVIDER_PACK}"
Create and apply the AppRelease resources for the Kubeadm provider and the VMware vSphere provider.
mkdir -p /root/yamls
export VSPHERE_PROVIDER_APPRELEASES="/root/yamls/vsphere-provider-appreleases.yaml"
cat > "${VSPHERE_PROVIDER_APPRELEASES}" <<EOF
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-kubeadm
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-kubeadm
releaseName: cluster-api-provider-kubeadm
targetRevision: ${KUBEADM_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-vsphere
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-vsphere
releaseName: cluster-api-provider-vsphere
targetRevision: ${VSPHERE_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
EOF
kubectl apply -f "${VSPHERE_PROVIDER_APPRELEASES}"
until kubectl get crd kubeadmcontrolplanes.controlplane.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q kubeadmcontrolplanes.controlplane.cluster.x-k8s.io; do
sleep 10
done
until kubectl get crd vsphereclusters.infrastructure.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q vsphereclusters.infrastructure.cluster.x-k8s.io; do
sleep 10
done
Set the provider package paths and chart versions.
export HCS_PROVIDER_PACK="/root/cluster-api-provider-hcs.amd64.<version>.tgz"
export KUBEADM_PROVIDER_PACK="/root/cluster-api-provider-kubeadm.amd64.<version>.tgz"
export HCS_PROVIDER_VERSION="<hcs-provider-chart-version>"
export KUBEADM_PROVIDER_VERSION="<kubeadm-provider-chart-version>"
Upload the packages.
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${HCS_PROVIDER_PACK}"
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${KUBEADM_PROVIDER_PACK}"
Create and apply the AppRelease resources for the Kubeadm provider and the HCS provider.
mkdir -p /root/yamls
export HCS_PROVIDER_APPRELEASES="/root/yamls/hcs-provider-appreleases.yaml"
cat > "${HCS_PROVIDER_APPRELEASES}" <<EOF
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-kubeadm
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-kubeadm
releaseName: cluster-api-provider-kubeadm
targetRevision: ${KUBEADM_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-hcs
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-hcs
releaseName: cluster-api-provider-hcs
targetRevision: ${HCS_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
EOF
kubectl apply -f "${HCS_PROVIDER_APPRELEASES}"
until kubectl get crd kubeadmcontrolplanes.controlplane.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q kubeadmcontrolplanes.controlplane.cluster.x-k8s.io; do
sleep 10
done
until kubectl get crd hcsclusters.infrastructure.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q hcsclusters.infrastructure.cluster.x-k8s.io; do
sleep 10
done
Set the provider package paths and chart versions.
export BAREMETAL_PROVIDER_PACK="/root/cluster-api-provider-baremetal.amd64.<version>.tgz"
export KUBEADM_PROVIDER_PACK="/root/cluster-api-provider-kubeadm.amd64.<version>.tgz"
export BAREMETAL_PROVIDER_VERSION="<baremetal-provider-chart-version>"
export KUBEADM_PROVIDER_VERSION="<kubeadm-provider-chart-version>"
Upload the packages.
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${BAREMETAL_PROVIDER_PACK}"
/root/cpaas-install/installer/res/amd64/packtool pack push \
-r "${LOCAL_REGISTRY_ADDRESS}" -c "${KUBEADM_PROVIDER_PACK}"
Create and apply the AppRelease resources for the Kubeadm provider and the Bare Metal provider.
Bare Metal Bootstrap Endpoint
During bootstrap, the global cluster has not handed control to the final VIP yet. Keep the bare-metal registration path on the bootstrap KIND host: global.platformUrl points to the bootstrap host, and elemental.server.url points to https://<kind-host-ip>:12443. Do not set baremetal.cluster.io/system-agent-server-url on the MachineRegistration used for the global machines during this phase. The handoff job changes the global machines to https://<CONTROL_PLANE_VIP>/kubernetes/global after the platform installation completes.
Prepare the bootstrap HTTPS certificate used by global-alb2 before you install the Bare Metal provider. The elemental-operator mounts the CA from cpaas-system/dex.tls, and elemental-system-agent uses that CA when elemental.tls.agentTLSMode is strict. The Secret must contain tls.crt, tls.key, and the CA bundle key that you configure in the AppRelease, shown below as ca.crt. The serving certificate must include ${HOST_IP} as an IP SAN because the bootstrap endpoint is https://${HOST_IP}:12443.
If cert-manager is available in the bootstrap KIND cluster, create or refresh dex.tls with a bootstrap-local CA:
kubectl get crd certificates.cert-manager.io issuers.cert-manager.io
kubectl -n cpaas-system apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: baremetal-bootstrap-selfsigned
namespace: cpaas-system
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: baremetal-bootstrap-ca
namespace: cpaas-system
spec:
secretName: baremetal-bootstrap-ca
commonName: baremetal-bootstrap-ca
duration: 87600h
renewBefore: 720h
isCA: true
privateKey:
algorithm: RSA
size: 2048
usages:
- cert sign
- crl sign
issuerRef:
name: baremetal-bootstrap-selfsigned
kind: Issuer
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: baremetal-bootstrap-ca
namespace: cpaas-system
spec:
ca:
secretName: baremetal-bootstrap-ca
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: dex-tls-bootstrap
namespace: cpaas-system
spec:
secretName: dex.tls
commonName: ${HOST_IP}
duration: 87600h
renewBefore: 720h
privateKey:
algorithm: RSA
size: 2048
usages:
- digital signature
- key encipherment
- server auth
ipAddresses:
- "${HOST_IP}"
- "127.0.0.1"
dnsNames:
- global-alb2
- global-alb2.cpaas-system.svc
- global-alb2.cpaas-system.svc.cluster.local
issuerRef:
name: baremetal-bootstrap-ca
kind: Issuer
EOF
kubectl -n cpaas-system wait certificate/baremetal-bootstrap-ca \
--for=condition=Ready \
--timeout=120s
kubectl -n cpaas-system wait certificate/dex-tls-bootstrap \
--for=condition=Ready \
--timeout=120s
Verify that the Secret has the expected keys and that the certificate is valid for the bootstrap endpoint:
kubectl -n cpaas-system get secret dex.tls \
-o jsonpath='{.data.tls\.crt}{" "}{.data.tls\.key}{" "}{.data.ca\.crt}{"\n"}'
tmp_dir=$(mktemp -d)
trap 'rm -rf "${tmp_dir}"' EXIT
kubectl -n cpaas-system get secret dex.tls \
-o jsonpath='{.data.ca\.crt}' | base64 -d > "${tmp_dir}/ca.crt"
kubectl -n cpaas-system get secret dex.tls \
-o jsonpath='{.data.tls\.crt}' | base64 -d > "${tmp_dir}/tls.crt"
openssl verify -CAfile "${tmp_dir}/ca.crt" "${tmp_dir}/tls.crt"
openssl x509 -in "${tmp_dir}/tls.crt" -noout -text | grep "IP Address:${HOST_IP}"
echo | openssl s_client \
-connect "${HOST_IP}:12443" \
-servername "${HOST_IP}" \
-CAfile "${tmp_dir}/ca.crt" \
-verify_return_error 2>&1 | grep "Verify return code: 0 (ok)"
If the bootstrap ALB keeps serving an older certificate, restart it and verify again:
kubectl -n cpaas-system rollout restart deploy/global-alb2
kubectl -n cpaas-system rollout status deploy/global-alb2
Do not include this bootstrap dex.tls in dcs-import-extra-resources. It is only for the temporary bootstrap endpoint. The final global cluster's dex.tls is created or maintained by the installer platform certificate flow. For DR, use the thirdParty console certificate guidance in Step 8 so the final platform certificate chain covers the platform domain, the primary VIP, and the standby VIP.
mkdir -p /root/yamls
export BAREMETAL_PROVIDER_APPRELEASES="/root/yamls/baremetal-provider-appreleases.yaml"
cat > "${BAREMETAL_PROVIDER_APPRELEASES}" <<EOF
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-kubeadm
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-kubeadm
releaseName: cluster-api-provider-kubeadm
targetRevision: ${KUBEADM_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${PLATFORM_HOST}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
---
apiVersion: operator.alauda.io/v1alpha1
kind: AppRelease
metadata:
annotations:
auto-recycle: "true"
interval-sync: "true"
name: cluster-api-provider-baremetal
namespace: cpaas-system
spec:
destination:
cluster: ""
namespace: ""
source:
chartPullSecret: global-registry-auth
charts:
- name: ait/chart-cluster-api-provider-baremetal
releaseName: cluster-api-provider-baremetal
targetRevision: ${BAREMETAL_PROVIDER_VERSION}
repoURL: ${BOOTSTRAP_REGISTRY_ADDRESS}
timeout: 120
values:
global:
albName: ${INGRESS_CLASS_NAME}
auth:
default_admin: admin@cpaas.io
cluster:
isGlobal: true
name: global
networkType: kube-ovn
type: Baremetal
host: ${PLATFORM_HOST}
ingress:
ingressClassName: ${INGRESS_CLASS_NAME}
tls:
secretName: dex.tls
labelBaseDomain: cpaas.io
namespace: cpaas-system
platformUrl: https://${HOST_IP}
protectSecretFiles:
enabled: false
region: global
registry:
address: ${BOOTSTRAP_REGISTRY_ADDRESS}
imagePullSecrets:
- global-registry-auth
replicas: 1
scheme: https
handoffHook:
controlPlaneVIP: ${CONTROL_PLANE_VIP}
delivery:
enabled: true
mode: always
elemental:
server:
url: https://${HOST_IP}:12443
systemAgent:
authMode: shared
serviceAccountName: baremetal-system-agent
tls:
agentTLSMode: strict
caCertSecretName: dex.tls
caCertSecretKey: ca.crt
EOF
kubectl apply -f "${BAREMETAL_PROVIDER_APPRELEASES}"
until kubectl get crd kubeadmcontrolplanes.controlplane.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q kubeadmcontrolplanes.controlplane.cluster.x-k8s.io; do
sleep 10
done
until kubectl get crd baremetalclusters.infrastructure.cluster.x-k8s.io --ignore-not-found 2>/dev/null | grep -q baremetalclusters.infrastructure.cluster.x-k8s.io; do
sleep 10
done
until kubectl get crd machineinventories.elemental.cattle.io --ignore-not-found 2>/dev/null | grep -q machineinventories.elemental.cattle.io; do
sleep 10
done
After the provider starts, verify that the chart values were accepted. A CrashLoopBackOff with an unknown flag such as --system-agent-auth-mode means the AppRelease chart and the elemental-operator image do not match; install a chart and image from the same release payload before continuing.
kubectl -n cpaas-system get pods | grep -E 'cluster-api-provider-baremetal|elemental'
kubectl -n cpaas-system logs deploy/elemental-operator --tail=100
Create one provider-specific manifest for the global cluster. The manifest uses the same provider resources as a workload cluster, but it must also include the global-specific labels, annotations, registry values, installer-compatible kubeadm settings, and persistent data paths required by the platform control plane.
Use the provider creation guides as the detailed resource reference:
Apply the naming convention from Common Prerequisites to every resource in the manifest you author below.
Set KubeadmControlPlane.spec.kubeadmConfigSpec.format to the value that the target provider accepts. The provider controllers enforce this:
Set the output path for the DCS global manifest before you render it.
export GLOBAL_DCS_YAML="/root/yamls/new-global.yaml"
The DCS global manifest must contain the following resources in the cpaas-system namespace:
Use the DCS resource fields from Creating Clusters on Huawei DCS and Infrastructure Resources for Huawei DCS. For the global cluster, keep these additional requirements:
- Set
Cluster.metadata.name and DCSCluster.metadata.name to global (the infra cluster shares the CAPI Cluster name). Prefix every other CAPI resource and provider resource with global-; the wiring fragment below uses KubeadmControlPlane.metadata.name: global-kcp.
- Add
Cluster.metadata.labels.is-global: "true" and Cluster.metadata.labels.cluster-type: DCS.
- Add
Cluster.metadata.annotations["cpaas.io/registry-address"] with ${NODE_REGISTRY_ADDRESS}.
- Set
KubeadmControlPlane.spec.kubeadmConfigSpec.format: ignition for Alauda OS.
- Keep the release manifest's non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries.
- For a normal non-DR deployment, do not set
DCSCluster.spec.encryptionProviderConfigRef and do not add /etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files.
- Keep
/var/cpaas as platform state. If you need the disk to survive rolling replacement, declare it in DCSIpHostnamePool.spec.pool[].persistentDisk; do not rely on DCSMachineTemplate template disks as preserved state.
- Use concrete
datastoreName values for DCS local storage unless you have verified that the selected datastore cluster can place volumes on hosts that can run the target VM.
Fragment Scope
The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these global-specific changes into the manifest that you prepare from the DCS create-cluster references, then apply the complete manifest file.
The following fragment shows the global-specific Cluster API wiring. Fill the provider resource fields by using the DCS create-cluster references above.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: global
namespace: cpaas-system
labels:
cluster-type: DCS
is-global: "true"
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: DCSCluster
cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
spec:
clusterNetwork:
pods:
cidrBlocks:
- ${CLUSTER_CIDR}
services:
cidrBlocks:
- ${SERVICE_CIDR}
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: global-kcp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
name: global
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: global-kcp
namespace: cpaas-system
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
replicas: 3
version: ${K8S_VERSION}
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: global-master-template
kubeadmConfigSpec:
format: ignition
clusterConfiguration:
etcd:
local:
serverCertSANs:
- "${CONTROL_PLANE_VIP}"
- "${PLATFORM_HOST}"
Set the output path for the VMware vSphere global manifest before you render it.
export GLOBAL_VSPHERE_YAML="/root/yamls/new-global.yaml"
The VMware vSphere global manifest must contain the following resources in the cpaas-system namespace:
Prepare the vSphere input values by using Preparing Parameters for a VMware vSphere Cluster. Prepare the global cluster manifest by using Creating a VMware vSphere Cluster in the global Cluster and VMware vSphere Provider as the base references. The create-cluster guide is written for workload clusters that are created from the global cluster, but most vSphere YAML in that guide can be reused for the global cluster after you apply the following additional requirements:
- Set
Cluster.metadata.name and VSphereCluster.metadata.name to global (the infra cluster shares the CAPI Cluster name). Prefix every other CAPI resource and provider resource with global-; the wiring fragment below uses KubeadmControlPlane.metadata.name: global-kcp.
- Add
Cluster.metadata.labels.is-global: "true" and Cluster.metadata.labels.cluster-type: VSphere.
- Add
Cluster.metadata.annotations["cpaas.io/registry-address"] with ${NODE_REGISTRY_ADDRESS}.
- Keep the VMware vSphere annotations required by the platform controllers, including the network and CPI annotations from the VMware vSphere create-cluster guide.
- Set
VSphereMachineTemplate.spec.template.spec.folder to /<datacenter>/vm/global so operators can identify the global cluster VMs in vCenter. In a DR deployment, use distinct child folders such as /<datacenter>/vm/global/primary and /<datacenter>/vm/global/standby for the primary and standby clusters.
- Set
VSphereCluster.spec.identityRef.name to global-vsphere-credentials. This fixed Secret name is required only for the VMware vSphere global installation path; non-global VMware vSphere clusters follow the generic create-cluster guide.
- Set
KubeadmControlPlane.spec.kubeadmConfigSpec.format: cloud-init, or leave the field unset because cloud-init is the default for VMware vSphere.
- Keep the release manifest's kubeadm files, including the VMware vSphere
/etc/kubernetes/encryption-provider.conf file entry, kubelet patches, audit policy, and installer RBAC entries. VMware vSphere delivers this file through KubeadmControlPlane.spec.kubeadmConfigSpec.files; do not follow the DCS DCSCluster.spec.encryptionProviderConfigRef pattern.
Fragment Scope
The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these global-specific changes into the manifest that you prepare from the VMware vSphere create-cluster guide, then apply the complete manifest file.
The following fragment shows the global-specific Cluster API wiring. Fill the provider resource fields by using the VMware vSphere create-cluster reference above.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: global
namespace: cpaas-system
labels:
cluster-type: VSphere
is-global: "true"
addons.cluster.x-k8s.io/vsphere-cpi: "enabled"
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: VSphereCluster
cpaas.io/alb-address-type: ClusterAddress
cpaas.io/network-type: kube-ovn
cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
spec:
clusterNetwork:
pods:
cidrBlocks:
- ${CLUSTER_CIDR}
services:
cidrBlocks:
- ${SERVICE_CIDR}
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: global-kcp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
name: global
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: global-kcp
namespace: cpaas-system
spec:
replicas: 3
version: "${K8S_VERSION}"
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
name: global-master-machine-template
kubeadmConfigSpec:
format: cloud-init
clusterConfiguration:
etcd:
local:
serverCertSANs:
- "${CONTROL_PLANE_VIP}"
- "${PLATFORM_HOST}"
Set the output path for the HCS global manifest before you render it.
export GLOBAL_HCS_YAML="/root/yamls/new-global.yaml"
The HCS global manifest must contain the following resources in the cpaas-system namespace:
Use the HCS resource fields from Creating Clusters on Huawei Cloud Stack and Infrastructure Resources for Huawei Cloud Stack. For the global cluster, keep these additional requirements:
- Set
Cluster.metadata.name and HCSCluster.metadata.name to global (the infra cluster shares the CAPI Cluster name). Prefix every other CAPI resource and provider resource with global-; the wiring fragment below uses KubeadmControlPlane.metadata.name: global-kcp.
- Add
Cluster.metadata.labels.is-global: "true" and Cluster.metadata.labels.cluster-type: HCS.
- Add
Cluster.metadata.annotations["cpaas.io/registry-address"] with ${NODE_REGISTRY_ADDRESS}.
- Set
KubeadmControlPlane.spec.kubeadmConfigSpec.format: cloud-init, or leave the field unset because cloud-init is the default for HCS.
- Keep the release manifest's non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries.
- For a normal non-DR deployment, do not add
/etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files.
- Keep
/var/cpaas as platform state. Declare it in HCSMachineConfigPool.spec.configs[].persistentDisks[] when it must survive node replacement; do not rely on HCSMachineTemplate.spec.template.spec.dataVolumes[] as preserved state.
- Use a highly available control plane for the
global cluster. Single-control-plane HCS clusters are creation-only topologies and are not the recommended global upgrade path.
Fragment Scope
The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these global-specific changes into the manifest that you prepare from the HCS create-cluster references, then apply the complete manifest file.
The following fragment shows the global-specific Cluster API wiring. Fill the provider resource fields by using the HCS create-cluster references above.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: global
namespace: cpaas-system
labels:
cluster-type: HCS
is-global: "true"
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: HCSCluster
cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
spec:
clusterNetwork:
pods:
cidrBlocks:
- ${CLUSTER_CIDR}
services:
cidrBlocks:
- ${SERVICE_CIDR}
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: global-kcp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCSCluster
name: global
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: global-kcp
namespace: cpaas-system
spec:
replicas: 3
version: "${K8S_VERSION}"
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCSMachineTemplate
name: global-master-machine-template
kubeadmConfigSpec:
format: cloud-init
clusterConfiguration:
etcd:
local:
serverCertSANs:
- "${CONTROL_PLANE_VIP}"
- "${PLATFORM_HOST}"
Set the output path for the Bare Metal global manifest before you render it.
export GLOBAL_BAREMETAL_YAML="/root/yamls/new-global.yaml"
The Bare Metal global manifest must contain the following resources in the cpaas-system namespace:
Use Creating Clusters on Bare Metal, Managing Nodes on Bare Metal, and Bare Metal Provider as the resource references. For the global cluster, keep these additional requirements:
-
Set Cluster.metadata.name and BaremetalCluster.metadata.name to global. Prefix every other CAPI, bare-metal, and elemental resource with global-.
-
Add Cluster.metadata.labels.cluster-type: ProviderBaremetal.
-
Add Cluster.metadata.annotations["cpaas.io/registry-address"] with ${NODE_REGISTRY_ADDRESS}.
-
Add Cluster.metadata.annotations["cpaas.io/kube-ovn-join-cidr"], Cluster.metadata.annotations["cpaas.io/sentry-deploy-type"]: Baremetal, and Cluster.metadata.annotations["cpaas.io/alb-address-type"]: ClusterAddress.
-
Set KubeadmControlPlane.spec.kubeadmConfigSpec.format: cloud-init, or leave it unset because cloud-init is the provider path used by Bare Metal.
-
Set KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge: 0. Bare-metal pools cannot over-provision physical hosts.
-
Keep controlplane.cluster.x-k8s.io/skip-kube-proxy: "" on the KubeadmControlPlane when the release manifest uses kube-ovn.
-
Put ${CONTROL_PLANE_VIP} and ${PLATFORM_HOST} in KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.serverCertSANs.
-
Set BaremetalCluster.spec.controlPlaneLoadBalancer.host to ${CONTROL_PLANE_VIP}, port to 6443, and use a vrid that is unique in the control-plane Layer-2 domain.
-
For a normal non-DR deployment, BaremetalCluster.spec.encryptionProviderConfigRef can be omitted. For DR, set it as described in Optional Disaster Recovery Deployment; do not deliver /etc/kubernetes/encryption-provider.conf by adding it to KubeadmControlPlane.spec.kubeadmConfigSpec.files.
-
Do not set baremetal.cluster.io/system-agent-server-url on the MachineRegistration used for the bootstrap global hosts. The bootstrap ISO must register through the bootstrap host; the handoff job later moves the global machines to the VIP.
-
If a global VM or physical host does not have DHCP during the live-ISO boot, configure the NIC manually from the host console before waiting for MachineInventory registration. Use the same NetworkManager procedure described in Creating Clusters on Bare Metal, replacing the example address, gateway, DNS, and connection name with the values for that host.
-
Do not depend on OS hostname side effects. The bare-metal provider normalizes kubeadm node names and provider IDs from the CAPI and inventory objects.
-
The SeedImage created in the bootstrap KIND environment is a bootstrap artifact. After handoff, create any new MachineRegistration or SeedImage on the active global cluster.
-
When adding new machines to an already installed global cluster, explicitly set the baremetal.cluster.io/system-agent-server-url annotation on that MachineRegistration to the current active global control-plane VIP. New global machines do not pass through the bootstrap handoff job, so this annotation is what makes their system-agent watch plan Secrets through https://<CONTROL_PLANE_VIP>/kubernetes/global. Non-global workload-cluster machines should continue to use the platform domain path.
apiVersion: elemental.cattle.io/v1beta1
kind: MachineRegistration
metadata:
name: global-<purpose>
namespace: cpaas-system
annotations:
baremetal.cluster.io/system-agent-server-url: https://<CONTROL_PLANE_VIP>
Fragment Scope
The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these global-specific changes into the manifest that you prepare from the Bare Metal create-cluster references, then apply the complete manifest file.
The following fragment shows the global-specific Cluster API wiring. Fill the inventory names, registration configuration, image references, and optional worker resources by using the Bare Metal create-cluster reference.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: global
namespace: cpaas-system
labels:
cluster-type: ProviderBaremetal
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: BaremetalCluster
cpaas.io/kube-ovn-join-cidr: "${KUBE_OVN_JOIN_CIDR}"
cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
cpaas.io/sentry-deploy-type: Baremetal
cpaas.io/alb-address-type: ClusterAddress
spec:
clusterNetwork:
pods:
cidrBlocks:
- ${CLUSTER_CIDR}
services:
cidrBlocks:
- ${SERVICE_CIDR}
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: global-kcp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: BaremetalCluster
name: global
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: BaremetalCluster
metadata:
name: global
namespace: cpaas-system
spec:
controlPlaneLoadBalancer:
type: Internal
host: ${CONTROL_PLANE_VIP}
port: 6443
vrid: <unique-vrid>
# vipMode defaults to nic. Set it explicitly only when the environment
# requires another supported mode, such as arp or policy_route.
# vipMode: nic
# Required for DR. Omit this field for a normal non-DR deployment unless
# you need to provide a pre-existing encryption-provider.conf.
# encryptionProviderConfigRef:
# name: global-encryption-provider-config
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: MachineInventoryPool
metadata:
name: global-control-plane-pool
namespace: cpaas-system
spec:
clusterName: global
machineInventories:
- global-cp-1
- global-cp-2
- global-cp-3
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: BaremetalMachineTemplate
metadata:
name: global-control-plane-template
namespace: cpaas-system
spec:
template:
spec:
machineInventoryPoolRef:
name: global-control-plane-pool
allocationPolicy: Ordered
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: global-kcp
namespace: cpaas-system
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
replicas: 3
version: "${K8S_VERSION}"
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: BaremetalMachineTemplate
name: global-control-plane-template
kubeadmConfigSpec:
format: cloud-init
clusterConfiguration:
etcd:
local:
serverCertSANs:
- "${CONTROL_PLANE_VIP}"
- "${PLATFORM_HOST}"
Step 5 — Apply the global Manifest
Apply the provider-specific manifest to minialauda.
kubectl apply -f "${GLOBAL_DCS_YAML}"
kubectl apply -f "${GLOBAL_VSPHERE_YAML}"
kubectl apply -f "${GLOBAL_HCS_YAML}"
kubectl apply -f "${GLOBAL_BAREMETAL_YAML}"
Wait for the bootstrap registrations to produce the expected inventories before you expect Cluster API reconciliation to progress.
kubectl -n cpaas-system get machineinventory.elemental.cattle.io
kubectl -n cpaas-system get machineinventorypool
kubectl -n cpaas-system get baremetalcluster,baremetalmachine
Step 6 — Wait for the Control Plane
Wait for the Cluster API provider to provision the machines and bring up the Kubernetes control plane.
kubectl get clusters.cluster.x-k8s.io -n cpaas-system
kubectl get kubeadmcontrolplane -n cpaas-system
kubectl get machines -n cpaas-system
The control plane is ready when the KubeadmControlPlane reports Ready: True and the Cluster reports Phase: Provisioned.
Step 7 — Import Provider Resources
Before triggering the installer, create the dcs-import-extra-resources ConfigMap in the cpaas-system namespace for providers that require extra resource import. The ConfigMap name keeps the dcs prefix for historical installer compatibility, even when the provider is not Huawei DCS.
VMware vSphere, Huawei Cloud Stack, and Bare Metal require this ConfigMap for both normal and disaster recovery global installations. Huawei DCS does not require it for the default installation because DCS provider resources are migrated by the built-in flow; create it for DCS only when you need to import additional resources beyond the built-in provider resource migration.
Do not create the DCS dcs-import-extra-resources ConfigMap for the default DCS installation path. DCS provider resources are migrated by the built-in flow.
Create and apply the VMware vSphere import ConfigMap before you trigger the installer. This ConfigMap is required for both normal and disaster recovery global installations. The global-vsphere-credentials Secret stores the vCenter username and password and must be the same Secret name referenced by VSphereCluster.spec.identityRef.name in the VMware vSphere global manifest.
mkdir -p /root/yamls
cat > /root/yamls/dcs-import-extra-resources.yaml <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: dcs-import-extra-resources
namespace: cpaas-system
data:
resources.yaml: |
resources:
- resource: "vsphereclusters.infrastructure.cluster.x-k8s.io"
names: ["global"]
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/vsphereclusters/cpaas-system/"
method: etcdctl
- resource: "vspheremachinetemplates.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/vspheremachinetemplates/cpaas-system/"
method: etcdctl
- resource: "vspheremachines.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/vspheremachines/cpaas-system/"
method: etcdctl
- resource: "vspherevms.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/vspherevms/cpaas-system/"
method: etcdctl
- resource: "vspheremachineconfigpools.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/vspheremachineconfigpools/cpaas-system/"
method: etcdctl
- resource: "secrets"
names: ["global-vsphere-credentials"]
method: kubectl
EOF
kubectl apply -f /root/yamls/dcs-import-extra-resources.yaml
Create and apply the HCS import ConfigMap before you trigger the installer. This ConfigMap is required for both normal and disaster recovery global installations. Set HCS_SECRET_NAME to the same Secret name used by HCSCluster.spec.identityRef.name.
mkdir -p /root/yamls
cat > /root/yamls/dcs-import-extra-resources.yaml <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: dcs-import-extra-resources
namespace: cpaas-system
data:
resources.yaml: |
resources:
- resource: "secrets"
names: ["${HCS_SECRET_NAME}"]
method: kubectl
- resource: "hcsclusters.infrastructure.cluster.x-k8s.io"
names: ["global"]
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/hcsclusters/cpaas-system/"
method: etcdctl
- resource: "hcsmachinetemplates.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/hcsmachinetemplates/cpaas-system/"
method: etcdctl
- resource: "hcsmachineconfigpools.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/hcsmachineconfigpools/cpaas-system/"
method: etcdctl
- resource: "hcsmachines.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/hcsmachines/cpaas-system/"
method: etcdctl
EOF
kubectl apply -f /root/yamls/dcs-import-extra-resources.yaml
Create and apply the Bare Metal import ConfigMap before you trigger the installer. This ConfigMap is required because the installer must import the bare-metal and elemental resources that were created in the bootstrap KIND cluster before the global cluster exists.
Import Before DCS API
Create dcs-import-extra-resources before calling POST /cpaas-installer/api/config/dcs. If it is missing, the handoff job can run with an empty target list because the new global cluster does not contain the BaremetalMachine, MachineInventory, MachineRegistration, or plan Secret objects that describe the bootstrap global machines.
Collect the plan Secret names and kubeadm bootstrap data Secret names after the global Cluster API resources have reconciled. These names do not exist before CAPI creates the Machines. For DR, also include the Secret referenced by BaremetalCluster.spec.encryptionProviderConfigRef so the final global cluster contains the same encryption-provider configuration. Do not import arbitrary platform credential Secrets, and do not import the MachineRegistration token Secret.
kubectl -n cpaas-system get machineinventory.elemental.cattle.io \
-o jsonpath='{range .items[*]}{.status.plan.secretRef.name}{"\n"}{end}'
kubectl -n cpaas-system get baremetalmachine \
-o jsonpath='{range .items[*]}{.status.planSecretRef.name}{"\n"}{end}'
kubectl -n cpaas-system get machine -l cluster.x-k8s.io/cluster-name=global \
-o jsonpath='{range .items[*]}{.spec.bootstrap.dataSecretName}{"\n"}{end}'
Create the ConfigMap. Replace the placeholder inventory, plan Secret, and bootstrap data Secret names with the values from the commands above.
mkdir -p /root/yamls
cat > /root/yamls/dcs-import-extra-resources.yaml <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: dcs-import-extra-resources
namespace: cpaas-system
data:
resources.yaml: |
resources:
- resource: "customresourcedefinitions.apiextensions.k8s.io"
names:
- baremetalclusters.infrastructure.cluster.x-k8s.io
- baremetalmachines.infrastructure.cluster.x-k8s.io
- baremetalmachinetemplates.infrastructure.cluster.x-k8s.io
- machineinventorypools.infrastructure.cluster.x-k8s.io
- machineinventories.elemental.cattle.io
- machineregistrations.elemental.cattle.io
- seedimages.elemental.cattle.io
method: kubectl
- resource: "baremetalclusters.infrastructure.cluster.x-k8s.io"
names: ["global"]
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/baremetalclusters/cpaas-system/"
method: etcdctl
- resource: "baremetalmachinetemplates.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/baremetalmachinetemplates/cpaas-system/"
method: etcdctl
- resource: "baremetalmachines.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/baremetalmachines/cpaas-system/"
method: etcdctl
- resource: "machineinventorypools.infrastructure.cluster.x-k8s.io"
etcdKeyBase: "/registry/infrastructure.cluster.x-k8s.io/machineinventorypools/cpaas-system/"
method: etcdctl
- resource: "machineinventories.elemental.cattle.io"
names:
- "<global-machine-inventory-1>"
- "<global-machine-inventory-2>"
- "<global-machine-inventory-3>"
etcdKeyBase: "/registry/elemental.cattle.io/machineinventories/cpaas-system/"
method: etcdctl
- resource: "machineregistrations.elemental.cattle.io"
names: ["<global-machine-registration>"]
etcdKeyBase: "/registry/elemental.cattle.io/machineregistrations/cpaas-system/"
method: etcdctl
- resource: "secrets"
names:
- "<global-machine-plan-secret-1>"
- "<global-machine-plan-secret-2>"
- "<global-machine-plan-secret-3>"
method: kubectl
- resource: "secrets"
names:
- "<global-kubeadm-bootstrap-data-secret-1>"
- "<global-kubeadm-bootstrap-data-secret-2>"
- "<global-kubeadm-bootstrap-data-secret-3>"
method: kubectl
# Required for DR when BaremetalCluster.spec.encryptionProviderConfigRef is set.
# The name must match the Secret referenced by BaremetalCluster.
- resource: "secrets"
names:
- "<global-encryption-provider-config-secret>"
method: kubectl
EOF
kubectl apply -f /root/yamls/dcs-import-extra-resources.yaml
kubectl -n cpaas-system get cm dcs-import-extra-resources -o yaml
Do not add SeedImage objects or MachineRegistration token Secrets to this import list. A bootstrap SeedImage produces an ISO that points at the bootstrap environment and is no longer the correct lifecycle object after the global cluster has been handed off. The seedimages.elemental.cattle.io CRD is imported only so the new global cluster understands the API type.
For DR, verify after installation that the final global cluster has the imported encryption-provider Secret.
kubectl --kubeconfig <global-kubeconfig> -n cpaas-system \
get secret <global-encryption-provider-config-secret> \
-o jsonpath='{.data.encryption-provider\.conf}{"\n"}'
Submit the platform installation request to the embedded installer REST API. The installer imports the Cluster API resources into the new global cluster, deploys the base operator, and installs the selected plugins.
export INSTALLER_IP=$(kubectl get pods -n cpaas-system -l service_name=cpaas-installer \
-o jsonpath='{.items[0].status.podIP}')
Network Scope
INSTALLER_IP is the Pod IP of the embedded installer in minialauda. The endpoint is used only during installation.
Create the provider-specific installer configuration JSON file on the current KIND host, then submit it to the installer endpoint. All providers in this install path use the same endpoint path, but their request bodies are different.
The DCS installer request includes the external HA VIP because DCS uses a third-party control plane VIP.
mkdir -p /root/yamls
export INSTALLER_CONFIG_JSON="/root/yamls/installer-config-dcs.json"
cat > "${INSTALLER_CONFIG_JSON}" <<EOF
{
"basic": {
"username": "admin@cpaas.io",
"password": "<base64-platform-admin-password>"
},
"registry": {
"domain": "${REGISTRY_DOMAIN}",
"username": "<registry-username>",
"password": "<base64-registry-password>"
},
"console": {
"host": [
"${CONTROL_PLANE_VIP}"
],
"globalHost": "${PLATFORM_HOST}",
"httpPort": 80,
"httpsPort": 443,
"cert": {
"selfSigned": {}
}
},
"cluster": {
"clusterCIDR": "${CLUSTER_CIDR}",
"serviceCIDR": "${SERVICE_CIDR}",
"features": {
"ha": {
"vip": "${CONTROL_PLANE_VIP}",
"vport": 6443,
"isThirdParty": true
}
}
},
"product": [
"base",
"acp"
],
"deployMode": "normal",
"hostIP": "${HOST_IP}"
}
EOF
curl -k -X POST "http://${INSTALLER_IP}:8080/cpaas-installer/api/config/dcs" \
-H 'Content-Type: application/json' \
-d @"${INSTALLER_CONFIG_JSON}"
Set console.host and cluster.features.ha.vip to the local global HA VIP. Do not use the platform domain in console.host; use console.globalHost for the platform access address.
VMware vSphere uses the same installer endpoint path as DCS, but its request body does not include cluster.features.ha. The control plane endpoint is declared in VSphereCluster.spec.controlPlaneEndpoint.host, and the cluster CIDRs are declared in the VMware vSphere Cluster manifest.
mkdir -p /root/yamls
export INSTALLER_CONFIG_JSON="/root/yamls/installer-config-vsphere.json"
cat > "${INSTALLER_CONFIG_JSON}" <<EOF
{
"basic": {
"username": "admin@cpaas.io",
"password": "<base64-platform-admin-password>"
},
"registry": {
"domain": "${REGISTRY_DOMAIN}",
"username": "<registry-username>",
"password": "<base64-registry-password>"
},
"console": {
"host": [],
"globalHost": "${PLATFORM_HOST}",
"httpPort": 80,
"httpsPort": 443,
"cert": {
"selfSigned": {}
}
},
"product": [
"base",
"acp"
],
"deployMode": "normal",
"hostIP": "${HOST_IP}"
}
EOF
curl -k -X POST "http://${INSTALLER_IP}:8080/cpaas-installer/api/config/dcs" \
-H 'Content-Type: application/json' \
-d @"${INSTALLER_CONFIG_JSON}"
Keep console.host as an empty list because the VMware vSphere control plane endpoint is already set in the global manifest. Do not use the platform domain in console.host; use console.globalHost for the platform access address.
HCS uses the same installer endpoint path as DCS, but its request body does not include cluster.features.ha. The control plane VIP is owned by the HCS ELB declared in HCSCluster.spec.controlPlaneLoadBalancer, so console.host must remain an empty list.
mkdir -p /root/yamls
export INSTALLER_CONFIG_JSON="/root/yamls/installer-config-hcs.json"
cat > "${INSTALLER_CONFIG_JSON}" <<EOF
{
"basic": {
"username": "admin@cpaas.io",
"password": "<base64-platform-admin-password>"
},
"registry": {
"domain": "${REGISTRY_DOMAIN}",
"username": "<registry-username>",
"password": "<base64-registry-password>"
},
"console": {
"host": [],
"globalHost": "${PLATFORM_HOST}",
"httpPort": 80,
"httpsPort": 443,
"cert": {
"selfSigned": {}
}
},
"product": [
"base",
"acp"
],
"deployMode": "normal",
"hostIP": "${HOST_IP}"
}
EOF
curl -k -X POST "http://${INSTALLER_IP}:8080/cpaas-installer/api/config/dcs" \
-H 'Content-Type: application/json' \
-d @"${INSTALLER_CONFIG_JSON}"
The Bare Metal installer request includes the control-plane VIP because the global cluster uses the VIP exposed by alive after handoff. Set REGISTRY_DOMAIN to the platform registry address that the final global cluster should use. For DR, use ${PLATFORM_HOST}:11443 on both primary and standby so the registry follows the platform-domain switch. For a non-DR deployment, ${CONTROL_PLANE_VIP}:11443 is also valid. Do not use the bootstrap registry address in this field.
mkdir -p /root/yamls
export INSTALLER_CONFIG_JSON="/root/yamls/installer-config-baremetal.json"
cat > "${INSTALLER_CONFIG_JSON}" <<EOF
{
"basic": {
"username": "admin@cpaas.io",
"password": "<base64-platform-admin-password>"
},
"registry": {
"domain": "${REGISTRY_DOMAIN}",
"username": "<registry-username>",
"password": "<base64-registry-password>"
},
"console": {
"host": [
"${CONTROL_PLANE_VIP}"
],
"globalHost": "${PLATFORM_HOST}",
"httpPort": 80,
"httpsPort": 443,
"cert": {
"selfSigned": {}
}
},
"cluster": {
"clusterCIDR": "${CLUSTER_CIDR}",
"serviceCIDR": "${SERVICE_CIDR}",
"features": {
"ha": {
"vip": "${CONTROL_PLANE_VIP}",
"vport": 6443,
"isThirdParty": true
}
}
},
"product": [
"base",
"acp"
],
"deployMode": "normal",
"hostIP": "${HOST_IP}"
}
EOF
curl -k -X POST "http://${INSTALLER_IP}:8080/cpaas-installer/api/config/dcs" \
-H 'Content-Type: application/json' \
-d @"${INSTALLER_CONFIG_JSON}"
Set console.host and cluster.features.ha.vip to the local Bare Metal global control-plane VIP. Use console.globalHost for the stable platform domain.
Third-Party Console Certificates
The examples use a self-signed console certificate. If the environment requires a third-party certificate, replace console.cert with a thirdParty block that contains the base64 full certificate chain, private key, and optional PKCS#12 values before you submit the installer request.
DR Certificate Requirement
For a primary/standby Bare Metal global DR deployment, do not let each side generate an unrelated self-signed certificate. Use a thirdParty certificate chain trusted by both sides. The certificate SAN list must cover ${PLATFORM_HOST}, the primary control-plane VIP, the standby control-plane VIP, and any required platform ingress service names. Otherwise existing system-agents can fail TLS verification after DNS is switched to the standby cluster.
Step 9 — Monitor the Installation
After the installer accepts the request, the install runs through several phases that are observable from the KIND host. A typical immutable-OS global cluster takes 30–60 minutes; total time depends on IaaS provisioning speed, image pull time, and the number of plugins selected.
Phases You Will Observe
Signals During Installation
Watch the installer progress API and the installer log together. If one appears stalled, check the underlying Cluster API resources directly on the bootstrap KIND host.
# Installer progress and live log
curl "http://${INSTALLER_IP}:8080/cpaas-installer/api/progress"
tail -f /var/cpaas/data/installer.log
# Cluster API resources on the bootstrap KIND host
kubectl get clusters.cluster.x-k8s.io -A
kubectl get kubeadmcontrolplane -A
kubectl get machines -A
The installer log records every phase transition. Transient errors retry on a short interval; persistent errors stay visible in the log and surface in the progress API as a stalled stage.
Check the global cluster after the installer reports success.
kubectl --kubeconfig <global-kubeconfig> get nodes
kubectl --kubeconfig <global-kubeconfig> get pods -n cpaas-system
kubectl --kubeconfig <global-kubeconfig> get clustermodule global
Common Stalls and Where to Look
Issues that are not listed here usually point to environment-specific causes. Capture the installer log, the progress API response, and the relevant kubectl describe output, then escalate.
Optional Disaster Recovery Deployment
Use this section when you deploy primary and standby global clusters for disaster recovery. Complete these additions before you apply the provider-specific manifest for each global cluster.
Primary and standby clusters must use the same encryption provider configuration. For Bare Metal, primary and standby must also use the same Kubernetes ServiceAccount signing key so that the fixed baremetal-system-agent token created on the primary cluster is accepted by the standby API server after failover. For DCS and Bare Metal, the provider-specific cluster resource references a Secret that contains encryption-provider.conf; for HCS, normal non-DR deployments do not add /etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files. VMware vSphere keeps the release manifest's /etc/kubernetes/encryption-provider.conf file entry.
Prepare Shared DR Variables
Set the same encryption key value on both the primary and standby installation environments.
export ENCRYPTION_PROVIDER_CONF="/root/yamls/encryption-provider.conf"
export ENCRYPTION_PROVIDER_SECRET_B64="<base64-shared-etcd-encryption-key>"
export PRIMARY_CLUSTER_VIP="<primary-ha-vip>"
export STANDBY_CLUSTER_VIP="<standby-ha-vip>"
export BAREMETAL_ENCRYPTION_PROVIDER_SECRET="global-encryption-provider-config"
export ETCD_SYNC_VERSION="<global-etcd-sync-version>"
export ETCD_SYNC_MODULEINFO="/root/yamls/global-etcd-sync-moduleinfo.json"
export SERVICE_ACCOUNT_ISSUER="https://kubernetes.default.svc.cluster.local"
Create the encryption provider configuration file on both installation environments.
mkdir -p "$(dirname "${ENCRYPTION_PROVIDER_CONF}")"
cat > "${ENCRYPTION_PROVIDER_CONF}" <<EOF_CONF
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_PROVIDER_SECRET_B64}
EOF_CONF
Prepare Shared ServiceAccount Signing Key
For Bare Metal DR, generate the ServiceAccount signing key once and use the same files in both the primary and standby KubeadmControlPlane manifests.
mkdir -p /root/global-dr-sa
openssl genrsa -out /root/global-dr-sa/sa.key 2048
openssl rsa -in /root/global-dr-sa/sa.key -pubout -out /root/global-dr-sa/sa.pub
chmod 0600 /root/global-dr-sa/sa.key
chmod 0644 /root/global-dr-sa/sa.pub
kubectl -n cpaas-system create secret generic global-sa-signing-key \
--from-file=sa.key=/root/global-dr-sa/sa.key \
--from-file=sa.pub=/root/global-dr-sa/sa.pub \
--dry-run=client -o yaml | kubectl apply -f -
Add the following entries to the primary and standby KubeadmControlPlane.spec.kubeadmConfigSpec. The file content and the issuer/audience values must be identical on both sides.
files:
- path: /etc/kubernetes/pki/sa.key
owner: root:root
permissions: "0600"
contentFrom:
secret:
name: global-sa-signing-key
key: sa.key
- path: /etc/kubernetes/pki/sa.pub
owner: root:root
permissions: "0644"
contentFrom:
secret:
name: global-sa-signing-key
key: sa.pub
clusterConfiguration:
apiServer:
extraArgs:
service-account-key-file: /etc/kubernetes/pki/sa.pub
service-account-signing-key-file: /etc/kubernetes/pki/sa.key
service-account-issuer: https://kubernetes.default.svc.cluster.local
api-audiences: https://kubernetes.default.svc.cluster.local
controllerManager:
extraArgs:
service-account-private-key-file: /etc/kubernetes/pki/sa.key
After the clusters are installed, verify the files and kubeadm static pod arguments on one control-plane node from each side.
sha256sum /etc/kubernetes/pki/sa.key /etc/kubernetes/pki/sa.pub
grep -E 'service-account-issuer|api-audiences|service-account-key-file|service-account-signing-key-file' \
/etc/kubernetes/manifests/kube-apiserver.yaml
grep -E 'service-account-private-key-file' \
/etc/kubernetes/manifests/kube-controller-manager.yaml
Add DR Certificate SANs to KubeadmControlPlane
In the manifest generated in Step 4, include both the primary and standby control plane VIPs and the platform access address in KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.serverCertSANs. Use the same SAN list on both the primary and standby installation environments.
serverCertSANs:
- "${PRIMARY_CLUSTER_VIP}"
- "${STANDBY_CLUSTER_VIP}"
- "${PLATFORM_HOST}"
Add Provider-Specific DR Fields
Create the encryption provider Secret in minialauda.
kubectl create secret generic encryption-provider-config \
--from-file=encryption-provider.conf="${ENCRYPTION_PROVIDER_CONF}" \
-n cpaas-system \
--dry-run=client -o yaml | kubectl apply -f -
Add the Secret reference to DCSCluster.spec.
encryptionProviderConfigRef:
name: encryption-provider-config
DCS uses DCSCluster.spec.encryptionProviderConfigRef to deliver the disaster recovery encryption provider configuration. Do not add /etc/kubernetes/encryption-provider.conf to KubeadmControlPlane.spec.kubeadmConfigSpec.files for the DCS DR path.
If you created dcs-import-extra-resources, keep the ConfigMap on both the primary and standby installation environments.
No VSphereCluster encryption Secret reference is required. For VMware vSphere, keep this file entry in KubeadmControlPlane.spec.kubeadmConfigSpec.files on both the primary and standby installation environments. The rendered /etc/kubernetes/encryption-provider.conf content must be identical on both sides, including the provider order, key name, and base64 key value. Also create the VMware vSphere dcs-import-extra-resources ConfigMap from Step 7 on both installation environments so the installer imports the vSphere infrastructure resources and the global-vsphere-credentials Secret.
- path: /etc/kubernetes/encryption-provider.conf
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_PROVIDER_SECRET_B64}
Keep the same DR serverCertSANs list on both the primary and standby installation environments.
No HCSCluster encryption Secret reference is required. For HCS, append this file entry to KubeadmControlPlane.spec.kubeadmConfigSpec.files on both the primary and standby installation environments. The rendered /etc/kubernetes/encryption-provider.conf content must be identical on both sides, including the provider order, key name, and base64 key value.
- path: /etc/kubernetes/encryption-provider.conf
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_PROVIDER_SECRET_B64}
Keep the same DR serverCertSANs list on both the primary and standby installation environments.
Create the HCS dcs-import-extra-resources ConfigMap from Step 7 on both installation environments. Set HCS_SECRET_NAME to the same Secret name used by HCSCluster.spec.identityRef.name.
Create the encryption provider Secret in minialauda on both the primary and standby installation environments. The Secret must be in the same namespace as BaremetalCluster and must contain a key named encryption-provider.conf.
kubectl create secret generic "${BAREMETAL_ENCRYPTION_PROVIDER_SECRET}" \
--from-file=encryption-provider.conf="${ENCRYPTION_PROVIDER_CONF}" \
-n cpaas-system \
--dry-run=client -o yaml | kubectl apply -f -
Reference that Secret from BaremetalCluster.spec.encryptionProviderConfigRef.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: BaremetalCluster
metadata:
name: global
namespace: cpaas-system
spec:
encryptionProviderConfigRef:
name: global-encryption-provider-config
The bare-metal provider reads this Secret and injects /etc/kubernetes/encryption-provider.conf into the generated control-plane bootstrap data. Do not also add the file manually to KubeadmControlPlane.spec.kubeadmConfigSpec.files for Bare Metal DR; the BaremetalCluster reference is the source of truth.
This Secret must also be included in the Bare Metal dcs-import-extra-resources ConfigMap from Step 7. It cannot remain only in the bootstrap KIND cluster because the handed-off global cluster keeps the imported BaremetalCluster object and must also have the referenced Secret for later provider reconciliation.
Keep the same DR serverCertSANs list on both the primary and standby installation environments.
Also keep the shared ServiceAccount signing key configuration from Prepare Shared ServiceAccount Signing Key. Without that key, the standby API server cannot validate the baremetal-system-agent token that existing hosts received before failover.
Create the Bare Metal dcs-import-extra-resources ConfigMap from Step 7 on both installation environments. The ConfigMap must import the bare-metal and elemental resources required by handoff, including the encryption-provider Secret referenced by BaremetalCluster.spec.encryptionProviderConfigRef, and must not import SeedImage.
The Bare Metal provider AppRelease on both sides must enable shared system-agent auth and handoff delivery:
handoffHook:
controlPlaneVIP: <current-side-control-plane-vip>
delivery:
enabled: true
mode: always
elemental:
systemAgent:
authMode: shared
serviceAccountName: baremetal-system-agent
tls:
agentTLSMode: strict
caCertSecretName: dex.tls
caCertSecretKey: ca.crt
The generated Role/cpaas-system/baremetal-system-agent must restrict secrets access with resourceNames that contain only plan Secret names. Do not grant namespace-wide Secret access and do not include registry, bootstrap, or platform credential Secrets.
Install Primary and Standby Clusters
Run Steps 1 through 9 for both the primary and standby global clusters.
For Bare Metal DR, use two independent bootstrap KIND hosts: one for the primary installation and one for the standby installation. Do not reuse the same bootstrap KIND cluster for both sides. The bootstrap environment contains installer state, AppRelease objects, registry Secrets, MachineRegistration, SeedImage, and handoff state; sharing it can pollute the two global installations and can make handoff or cleanup affect the wrong side.
Use the provider-specific installer configuration differences for both sides:
For the primary cluster, make sure the platform domain resolves to the primary HA VIP. In Step 8, set hostIP to the primary KIND node IP. For DCS, set console.host and cluster.features.ha.vip to the primary HA VIP. For VMware vSphere, set the control plane endpoint in the primary manifest to the primary HA VIP. For HCS, keep console.host: [] because the VIP is owned by the HCS ELB. For Bare Metal, set both the manifest VIP and the installer VIP fields to the primary control-plane VIP.
After the primary cluster installation succeeds, switch the platform domain to the standby HA VIP as required by the DR procedure. Then install the standby cluster. This DNS switch before the standby installation is required because several platform resources are rendered with the platform domain and must resolve to the standby entrance while the standby installer runs. In Step 8 on the standby KIND host, set hostIP to the standby KIND node IP. For DCS, set console.host and cluster.features.ha.vip to the standby HA VIP. For VMware vSphere, set the control plane endpoint in the standby manifest to the standby HA VIP. For HCS, keep console.host: []. For Bare Metal, set both the manifest VIP and the installer VIP fields to the standby control-plane VIP, and keep REGISTRY_DOMAIN as ${PLATFORM_HOST}:11443 on both sides. Get INSTALLER_IP from the cpaas-installer Pod on the standby KIND host; do not reuse the primary KIND host value.
After both clusters are installed, get the primary k8sadmin token on a primary control plane node. etcd-sync is installed only on the standby cluster, and its active_cluster_* values point to the primary cluster. Keep this value in its original base64 Secret form for active_cluster_token.
export PRIMARY_CLUSTER_TOKEN_B64="$(sudo kubectl get secret -n cpaas-system k8sadmin -o jsonpath='{.data.token}')"
Get the standby k8sadmin token on a standby control plane node. Use this decoded bearer token to call the standby cluster ModuleInfo API.
export STANDBY_CLUSTER_BEARER_TOKEN="$(sudo kubectl get secret -n cpaas-system k8sadmin -o jsonpath='{.data.token}' | base64 -d)"
If you create the global-etcd-sync ModuleInfo payload from a different host, securely transfer the decoded value from the standby control plane node and export it there.
export STANDBY_CLUSTER_BEARER_TOKEN="<decoded-standby-token>"
Create the global-etcd-sync ModuleInfo payload for the standby cluster. The active_cluster_vip and active_cluster_token values must point to the primary cluster.
cat > "${ETCD_SYNC_MODULEINFO}" <<EOF
{
"kind": "ModuleInfo",
"apiVersion": "cluster.alauda.io/v1alpha1",
"metadata": {
"name": "global-etcd-sync",
"labels": {
"cpaas.io/cluster-name": "global",
"cpaas.io/module-name": "etcd-sync",
"cpaas.io/module-type": "plugin"
}
},
"spec": {
"version": "${ETCD_SYNC_VERSION}",
"config": {
"monitor_check_interval": 1,
"detail": false,
"active_cluster_vip": "${PRIMARY_CLUSTER_VIP}",
"active_cluster_token": "${PRIMARY_CLUSTER_TOKEN_B64}"
}
}
}
EOF
Install global-etcd-sync by calling the ModuleInfo API on the standby cluster.
curl -sk -X POST "https://${STANDBY_CLUSTER_VIP}/apis/cluster.alauda.io/v1alpha1/moduleinfoes" \
-H "Authorization: Bearer ${STANDBY_CLUSTER_BEARER_TOKEN}" \
-H "Content-Type: application/json" \
-d @"${ETCD_SYNC_MODULEINFO}"
Restart the Pods that must reload DR and endpoint configuration. Run the same commands on a primary control plane node and on a standby control plane node.
sudo kubectl delete po -n cpaas-system -l 'service_name in (alertmanager,vmselect,vminsert)'
sudo kubectl delete po -n cpaas-system -l service_name=cpaas-elasticsearch
sudo kubectl delete po -n cpaas-system -l service_name=cluster-transformer
For the DR lifecycle after installation, see Global Cluster Disaster Recovery.
Verification
After the installer reports completion, verify that the global cluster is healthy.
kubectl --kubeconfig <global-kubeconfig> get nodes
kubectl --kubeconfig <global-kubeconfig> get clusters.platform.tkestack.io global \
-o jsonpath='{.status.phase}'
kubectl --kubeconfig <global-kubeconfig> get pods -n cpaas-system
kubectl --kubeconfig <global-kubeconfig> get clustermodule global
The installation is successful when all of the following conditions are true:
- The installer progress API reports
status: Success and type: Complete.
- All
global cluster nodes are Ready.
- Critical Pods in
cpaas-system are Running or Completed.
ClusterModule/global reports the base module as healthy.
Next Steps