Skip to content

Hack The Garden May 2026: Self-Hosted Shoots, Disaster Recovery, and a Modernized VPN

From May 4–8, 2026, the Gardener community gathered at Schlosshof in Schelklingen for another week of focused collaboration. The full per-topic write-up is available on the community page, and the review meeting recording covers the highlights. Continue reading to find out more about larger storylines that emerged!

Hack The Garden May 2026 in Schelklingen

🌿 GEP-28: Self-Hosted Shoot Clusters Take Shape

A large portion of the hackathon was dedicated to advancing GEP-28 (Self-Hosted Shoot Clusters), with multiple workstreams pushing the design closer to production-readiness.

The shoot/shoot controller — previously missing from gardenadm connect — was prototyped to run inside gardenlet, with the flow package extended by reusable TaskGroups so that the hosted and self-hosted flows can share tasks instead of being maintained separately (branch).

A kubeadm-inspired discovery token mechanism removes the need to pass the full CA bundle on the command line. gardenadm init now publishes a signed kube-public/cluster-info ConfigMap, and gardenadm join/connect accept a --discovery-token-ca-cert-hash sha256:<64-hex> flag that verifies the JWS signature, validates the CA against the supplied SPKI pin, and only then re-fetches over TLS. The same mechanism was extended to gardener-operator via a new Garden.Spec.VirtualCluster.Kubernetes.KubeAPIServer.EnableBootstrapDiscovery field (branch).

Following GEP-28's March 2026 demo, running garden and seed inside a self-hosted shoot was extended to managed infrastructure, where nodes are provisioned by machine-controller-manager. A make gind-up SCENARIO=full target now automates the full path to a healthy seed. A side-track explored running local provider "machines" directly as Docker containers, which enabled a first successful local control plane migration without workload downtime.

For exposing self-hosted API servers, GEP-36's SelfHostedShootExposure API was implemented in the Cilium extension using L2 Announcements (PR#693). A connectivity issue — Cilium losing its connection to the default Kubernetes service when the API server rolls over to manage itself — was resolved by pointing Cilium to localhost via a CiliumNodeConfig targeted at control plane nodes.

Finally, the team reworked the approach to joining control plane nodes in managed infrastructure. Rather than dynamically updating the OperatingSystemConfig with node-specific etcd certificates, the new design pushes only CA secrets through the OSC and lets etcd-backup-restore generate node-specific certificates on startup, with etcd-druid/backup-restore managing the etcd member list dynamically. This removes the need for Etcd.spec.externallyManagedMemberAddresses for self-hosted shoots (WIP PR).

🛜 Networking and VPN Modernization

The long-running effort to replace OpenVPN with WireGuard, continued from June 2025, reached a working state for multiple shoots in parallel. A wpex sidecar multiplexes connections behind a single Istio ingress: the first packet is forwarded to all configured vpn-seed-servers, and each either responds or silently ignores based on the WireGuard public key. Istio Ingress can now be scaled beyond a single replica, and NetworkPolicys are managed via labels on istio-ingress and vpn-seed-server. A related VPN bug was fixed along the way (gardener/gardener#14776). Code lives across gardener, vpn2, and wpex.

Building on gardener/gardener#14420, the ACL extension gained virtual garden support for IP allowlisting (PR #280). The work also identified that the virtual garden API server domain needs to be exposed in the Garden status, captured in a WIP branch.

💾 Disaster Recovery and Backup Security

Two efforts targeted the resilience of garden and shoot state.

The GardenState resource addresses a long-standing pain point: today, recovering a destroyed Garden cluster is fully manual. The team encoded GardenState as a Secret (label operator.gardener.cloud/purpose: garden-state) in the runtime cluster's garden namespace, containing a JSON snapshot of persist=true secrets, Garden metadata and spec, extension state from DNSRecord/BackupEntry/Extension CRDs, and the Garden UID (preserving the etcd backup bucket name). A Secret was chosen over a CRD because it is available before CRDs are installed, can be extracted with plain kubectl, and is straightforward to back up externally. A bootstrapper in gardener-operator detects a garden-state secret without a Garden resource on startup and drives the restore flow (branch).

In parallel, per-shoot etcd backup encryption was prototyped to limit the blast radius of a compromised control plane. Today, all shoot backups on a seed share a bucket; with this change, gardener generates a shoot-specific encryption secret persisted in ShootState, etcd-druid wires it into etcd-backup-restore via an EncryptionConfig analogous to kube-apiserver encryption-at-rest, and etcd-backup-restore implements AESGCM encryption with full key rotation via an encrypted keyring stored alongside the backups. Code spans gardener, etcd-druid, and etcd-backup-restore.

🌐 Domain and DNS Flexibility

A design effort tackled the rigidity of Gardener's internal domain handling. Three use cases were scoped:

  1. Optional internal domain per shoot via a new Shoot.spec.dns.internalDomain.enabled field. Disabling on existing shoots requires a coordinated CA rotation across node rollout and DNSRecord lifecycle.
  2. Changing the external domain of a shoot, modeled as a two-phase CA rotation that adds the new domain in the preparing phase and removes the old one in the completing phase.
  3. Changing the internal domain of all shoots on a seed, with the seed spec carrying a list of internal domains and a shoot-owner-initiated CA rotation migrating each shoot.

A WIP implementation of the first use case is available (branch); STACKIT plans to formalize all three in an enhancement proposal.

🛠️ Operational Improvements

Several smaller workstreams improved day-to-day operations.

Debugging failed node joins got significantly easier: the gardener-node-agent now performs a connection test during bootstrap, and fatal errors after bootstrap are written to the node's console log instead of disappearing once the agent connects to the API server (PR #14760).

gardenctl learned a defaultKubeconfigAccessLevel per garden, supporting admin, viewer, and auto for both shoots and managedSeeds — letting admins default to a viewer kubeconfig and reducing the blast radius of accidental writes. gardenctl config set-garden automatically refreshes the symlinked kubeconfig when the access level changes (PR #735).

For shoots with confineSpecUpdateRollout enabled, an admission plugin was prototyped that stages spec changes in a ConfigMap rather than writing them directly to .spec. The staged state is applied at the start of the next maintenance window, keeping .spec a faithful representation of what is currently running while making pending changes inspectable and cancellable.

📉 Reducing Secret Watch Pressure on Seeds

gardener-resource-manager stores all rendered manifests in Secrets referenced by ManagedResource.spec.secretRefs, including those without sensitive data. Measurements on production-like seeds quantified the cost:

ClusterShootsManagedResourcesSecretsIn-memory size
119311,75312,453204.90 MB
227215,60516,374281.22 MB

On average, ManagedResource secret data accounts for more than half of all secret data on a seed — roughly 1 MB per shoot control plane. A prototype introduced a new ManagedResourceData CRD for non-sensitive manifests, with ManagedResource.spec extended by dataRefs alongside secretRefs. In a single-shoot local test, 38 of 51 control plane manifests moved out of Secrets (commit). Open questions remain around the schema, classification interface, and garbage collection strategy for mutable ManagedResourceData objects.

🍂 Dropping ManagedSeedSet

Not every topic ended in a continuation. After implementing and evaluating a WIP controller for the long-incomplete ManagedSeedSet proposal, the team concluded that finishing it isn't worth the cost:

  • Keeping gardenlet/Seed config in sync across set members is unreliable because gardenlet version propagation via the parent gardenlet bypasses controllers trying to keep members uniform.
  • Production landscapes prefer declarative, manual seed management over autoscaling.
  • Partition-based staged rollouts remain hard due to config drift.
  • The feature has no known production users.

Unless someone strongly objects, the API will be removed in the near future.


Hack The Garden May 2026 in Schelklingen

🌷 Closing Thoughts

This event pushed several multi-hackathon projects — GEP-28 self-hosted shoots, the WireGuard transition — substantially forward, while also opening new directions in disaster recovery and seed-side resource pressure. Many topics are now ready for cleanup, GEPs, and PR submission.

The next hackathon is already up on the horizon. If you want to join, drop by the Gardener Slack (#hack-the-garden). See you there! ✌️


ApeiroRA