Kubernetes & Orchestration: Clusters that support everyday operations

Kubernetes can significantly accelerate development and operational processes—but only if the cluster is not understood as “once installed, done.” In many companies today, the sticking point is less the initial deployment and more the reliable operation across updates, team changes, and growing workloads. This is precisely where it is decided whether orchestration really brings scaling or whether complexity is simply shifted elsewhere.

Since the shift to shorter release cycles and the noticeably higher patch pressure in container ecosystems, a cluster has primarily become an operating system for platform teams: with clear responsibilities, reproducible processes, and traceable security. Those who self-host Kubernetes independently of hyperscalers gain transparency and control—but must assemble architecture, security baselines, and automation in such a way that the setup remains stable even under real-world conditions.

Comeli as a Kubernetes operator wearing a hard hat, stacking container blocks, representing orchestration of Kubernetes workloads.

Business relevance

Kubernetes is relevant for companies when speed and standardization need to come together: deployments become repeatable, environments comparable, and changes can be brought into production via defined paths. This reduces friction – provided that operation and governance are part of the design.

This is particularly important at present because supply chain and dependency risks have become more visible in the container stack: images, registries, charts, operators, and CI/CD are real attack and failure surfaces, not just “dev topics.” In regulated environments (depending on the industry, e.g., NIS2-oriented programs or internal audit requirements), it is also important that policies, role models, and evidence work consistently across clusters and namespaces.

The benefits are then very concrete: fewer special cases per team, faster recovery after disruptions, cleaner cost and capacity planning, and better separation between platform and application. Kubernetes is thus less of a “technical decision” and more of an operating model.

Operating model & ownership

Comeli represents an operating model and clear ownership - making responsibility and operations measurable.

Who decides on cluster standards, who operates the control plane, and how are changes approved? In practice, a clear RACI helps, because otherwise Kubernetes quickly belongs to “everyone and no one” – especially when teams change or several products are onboarded in parallel.

Update & Security Capability

Comeli as a boxer - security capability through hardening, patching, and risk reduction.

How are cluster upgrades, CNI/CSI updates, certificate rotation, and policy changes handled? Since PSA/pod security standards, stricter defaults, and shorter release cycles, upgrade capability has become a core criterion rather than a maintenance issue.

Integration, Data & Lifecycle

Comeli on safari - keeping integration, data, and lifecycle in view: authentication, logging, CI/CD.

How are identity (OIDC/LDAP), logging/monitoring, backup/DR, and storage integrated for stateful workloads? Stateful is now commonplace—and the choice of CSI/backup mechanics determines whether recovery is a plan or a hope.

The Comeli dragon is teaching at the blackboard at ComelioCademy.

Specific trainings and current topics can be found in the Comelio GmbH course catalog.
Whether in-house at your company, as a webinar, or as an open event – the formats are flexibly tailored to different requirements.

Typical misunderstandings

“Kubernetes automatically brings stability”

Stability is not created by the scheduler, but by SLOs, capacity limits, clean health checks, clear quotas, and an incident-ready observability setup. In practice, it often fails because monitoring remains just a “nice to have” – until the first resource bottleneck or CNI problem escalates.

“Security is an add-on that can be added later”

With today’s ransomware and supply chain patterns, ‘later’ is an expensive time. RBAC, network policies, admission controls, and secret handling must be defined early on because they define the team interfaces. Otherwise, subsequent “tightening” breaks workloads or leads to shadow IT.

“Multi-tenancy is just a namespace pattern.”

Namespaces are a start, but without isolation (policies, quotas, separate ingress/egress paths, separate node pools if necessary), multi-tenancy remains just a label. Especially as platform teams grow, this becomes an operational risk issue—not a matter of style.

“The installer is the main decision.”

Whether kubeadm, Kubespray, or a managed offering: the lifecycle model is crucial. If you don’t define upgrades, certificate lifetimes, backup/restore, and node replacement as a standardized process, you’re building a fragile platform in the long run.

Frequently asked questions about Kubernetes

In this FAQ, you will find the topics that come up most frequently in consulting and training. Each answer is kept short and refers to further content if necessary. Can’t find your question? We are happy to help you personally.

Comeli dragon leans against a “FAQ” sign and answers questions about Kubernetes.

kubeadm is suitable if you want to stay “close to the system” and design the automation yourself. Kubespray comes into its own when reproducibility, idempotence, and a standardized lifecycle model are more important – especially with multiple clusters or changing teams.

Flannel is often sufficient for simple environments and labs, but has less depth in terms of policies. Calico is established in many productive setups, especially when network policies are central. Cilium is a good fit if you want eBPF-based observability and fine-grained traffic control – but this also increases the demand for operational expertise.

Namespace design is just the start. Multi-tenancy becomes viable through a combination of RBAC, network policies, quotas/limit ranges, clear ingress/egress rules, and a defined platform contract (templates, defaults, support limits). In practice, it is worth treating this as a “product” of the platform.

Often, it’s capacity and resource problems (requests/limits are missing or unrealistic), unclear responsibilities for add-ons (CNI/CSI/Ingress), and a lack of standard processes for upgrades and certificates. Too many extras too early (mesh, policy engines, operators) can also compromise stability if operations and observability don’t grow with them.