Creating Clusters on Bare Metal
This document explains how to create Kubernetes clusters on physical servers using the bare-metal provider. The workflow is YAML-only — there is no Fleet Essentials UI for bare-metal clusters at this time.
TOC
Prerequisites1. Required Plugin Installation2. Image Catalog Confirmed3. Network Connectivity4. TPM Decision5. Public Registry Credential (Only When Installing Platform Components Later)Cluster Creation WorkflowResolving Placeholder ValuesStep 1: Build the SeedImage and Register HostsStep 2: CreateMachineInventoryPool ResourcesStep 3: Create the Control-Plane Cluster ResourcesStep 4: Deploy Worker NodesCluster VerificationUsing kubectlExpected ResultsCommon Failure ModesNext StepsAppendixComplete KubeadmControlPlane ConfigurationPrerequisites
Before creating clusters, ensure all of the following prerequisites are met.
1. Required Plugin Installation
Install the following plugins on the global cluster:
- Alauda Container Platform Kubeadm Provider
- Alauda Container Platform Bare Metal Infrastructure Provider (umbrella chart that installs both the bare-metal manager and
elemental-operator)
See the Installation Guide for details.
2. Image Catalog Confirmed
The bare-metal provider chart ships an elemental-image-catalog ConfigMap that maps Machine.spec.version to the elemental upgrade image used to (re)provision a node. You do not need to create this ConfigMap separately — confirm that the target Kubernetes version is present:
Every value used as Machine.spec.version (for both the control plane and worker MachineDeployment resources) must appear as a key in this ConfigMap, with the leading v preserved. The provider resolves the image at reprovision time by substituting the platform registry address for the registry portion of the entry. If a target version is missing, no reprovision plan is written and BaremetalMachine ends up in Failed / Reason=ImageCatalogMiss until the entry is added.
3. Network Connectivity
- Every physical host must be able to reach
global.platformUrl(elemental-system-agentregistration, plan secret polling). - Every physical host must be able to pull from the platform registry (
global.registry.address) for bothelemental install(during the first boot) andelemental upgrade(during every reprovision). - The control-plane VIP must live in the same Layer-2 broadcast domain as the control-plane node IPs. The
vridchosen for the VIP must be unique within that broadcast domain.
If the target VM or physical host does not receive an address from DHCP while booted into the live ISO, configure the network manually from the host console before waiting for registration. Check the NetworkManager connection name first, then apply the site-specific address, gateway, and DNS values:
4. TPM Decision
Production hosts should keep MachineRegistration.spec.config.elemental.registration.emulate-tpm: false (or remove the field). For PoC and virtual-machine smoke tests, set emulate-tpm: true and emulated-tpm-seed: -1 so registration works without a real TPM.
5. Public Registry Credential (Only When Installing Platform Components Later)
public-registry-credential is not required to create a bare-metal cluster. It only becomes necessary when later platform components on the new workload cluster need to pull from a credentialed public registry. If your test scope ends at cluster + node Ready, you can ignore this prerequisite.
Cluster Creation Workflow
When using YAML, the workflow proceeds through five steps. Every step must be applied in the cpaas-system namespace.
Important Namespace Requirement
All bare-metal resources must be applied in the cpaas-system namespace. The provider and elemental-operator only reconcile objects in that namespace.
Workload Cluster Naming
The workload cluster-name must not be global. That name is reserved for the global cluster, and reusing it causes the workload cluster's resources to collide with global cluster resources in cpaas-system. As a convention, keep the CAPI Cluster and BaremetalCluster named exactly <cluster-name>, and prefix dependent resources (KubeadmControlPlane, KubeadmConfigTemplate, MachineDeployment, machine templates, pools, registrations) with <cluster-name>-.
Resolving Placeholder Values
The example manifests below use <placeholder> syntax for environment-specific values:
Step 1: Build the SeedImage and Register Hosts
Create a MachineRegistration that describes the registration URL and first-install cloud-config, and a SeedImage that points elemental-operator at the matching ISO base image.
SeedImage.spec.baseImage is derived from the image catalog entry for the target Kubernetes version: take the catalog repository, append -iso, and keep the same tag or digest. For example, if elemental-image-catalog resolves v1.33.7-2 to <registry-address>/tkestack/baremetal-base-image:v0.0.0-beta-1.33.7-2, then SeedImage.spec.baseImage is <registry-address>/tkestack/baremetal-base-image-iso:v0.0.0-beta-1.33.7-2.
Do not add /etc/resolv.conf to SeedImage.spec.cloud-config. Keep the ISO generic, and put site-specific resolver configuration in MachineRegistration.spec.config.cloud-config only when the first-boot registration path needs it. During the tested Global deployment flow, the ISO did not carry resolver files; node DNS was configured later by the kubeadm bootstrap data.
install.device, install.eject-cd, and install.reboot are intentional. If the target disk is omitted, elemental install can select the live ISO device in VM-based tests. If eject-cd or reboot is false, a host can remain in the live environment after the first install and never become usable inventory for Cluster API.
Apply the manifest and wait for the SeedImage build to finish:
When status.state reaches Completed, fetch the download URL and ISO checksum:
Boot every target host from this ISO. elemental-register runs first (creates the MachineInventory and uploads observedNetwork), then elemental install writes the on-disk OS. After install completes, the host stays available for plan execution. If the live ISO environment has no DHCP address, configure NetworkManager manually on the host console as described in Network Connectivity before waiting for the MachineInventory.
Confirm registration:
Every inventory you intend to use must:
- Show
Ready=True. - Have a non-empty
status.plan.secretRef.name. - Have a
spec.observedNetworkthat matches the host's expected NIC (only required when you want the install-time IP to survive across reprovisions).
Record the exact MachineInventory names — they are referenced by name in the next step.
Step 2: Create MachineInventoryPool Resources
Create one pool per role. The pool reconciler validates that every member exists, computes capacity counters, and writes the baremetal.alauda.io/pool=<pool-name> annotation onto the inventory.
Key parameters:
Apply and verify:
A healthy pool reports Ready=True, total = len(spec.machineInventories), and available = total - allocated - preparing - reprovisioning - unavailable. Inventories listed in spec.machineInventories that fail validation (missing, plan secret missing, Ready=False) raise the pool's unavailable counter and surface in the MembersValid condition.
Size the control-plane pool to at least KubeadmControlPlane.spec.replicas. Size the worker pool to at least MachineDeployment.spec.replicas. For rolling upgrades the pool must hold the entire replica count — the provider uses delete-then-add semantics from the same pool, never both at once.
Step 3: Create the Control-Plane Cluster Resources
Create the BaremetalCluster (declares the control-plane VIP), the control-plane BaremetalMachineTemplate (points at the control-plane pool), the KubeadmControlPlane (replicas + kubeadm config), and the CAPI Cluster.
Full Configuration Reference
The example below uses a minimal KubeadmControlPlane. For the full hardening profile recommended in production — admission, audit, kubelet patches, encryption provider — see Complete KubeadmControlPlane Configuration in the Appendix.
Cluster annotations. The bare-metal provider relies on a small set of Cluster annotations during reconcile. Authoritative ones the operator must set:
BaremetalCluster parameters:
Apply and watch:
Each new control-plane BaremetalMachine advances Pending → Allocated → Reprovisioning → Running. Watch:
BaremetalMachine.status.machineInventoryRef.name— which inventory was picked.BaremetalMachine.status.planSecretRef.name— plan secret being driven. The secret carriesbaremetal.alauda.io/plan.type=reprovision.MachineInventory.status.plan.state—Appliedonce the host completescloud-init clean,elemental upgrade, reboot, andkubeadm init/join.BaremetalCluster.status.conditions[EndpointReady]— true once the VIP is reachable.
The bare-metal provider does not support single-node control planes. Provision at least three control-plane replicas (KubeadmControlPlane.spec.replicas: 3) so that alive can arbitrate the VIP and etcd retains quorum.
Step 4: Deploy Worker Nodes
After the control plane is Ready, create the worker BaremetalMachineTemplate, the worker KubeadmConfigTemplate, and the MachineDeployment. The full worker YAML and parameter table are in Managing Nodes on Bare Metal → Worker Node Deployment.
Cluster Verification
Using kubectl
Expected Results
A successfully created cluster shows:
Cluster.status.conditions[Ready]=True.KubeadmControlPlanereplicas allReady.- Every
BaremetalMachine.status.phase=Runningandstatus.ready=true. - Every used
MachineInventory.status.plan.state=Appliedwithbaremetal.alauda.io/plan.type=reprovisionon its plan secret. MachineInventoryPool.statussatisfiesavailable + allocated + preparing + reprovisioning + unavailable = total.- Kubernetes Nodes Ready.
Common Failure Modes
For the full operator-side state machine reference (every condition reason and recovery action), see Provider Overview → clean / reprovision plans.
Next Steps
After creating a cluster:
Appendix
Complete KubeadmControlPlane Configuration
The hardened configuration recommended for production bare-metal clusters — admission control, audit policy, kubelet patches, encryption provider, and IPv6 bind addresses. Substitute the placeholders from the table in Resolving Placeholder Values.
Worker bootstrap is symmetric — see Managing Nodes on Bare Metal → Bootstrap Template for the worker KubeadmConfigTemplate.