Kubernetes & Orchestration: Clusters that support everyday operations
Kubernetes can significantly accelerate development and operational processes—but only if the cluster is not understood as “once installed, done.” In many companies today, the sticking point is less the initial deployment and more the reliable operation across updates, team changes, and growing workloads. This is precisely where it is decided whether orchestration really brings scaling or whether complexity is simply shifted elsewhere.
Since the shift to shorter release cycles and the noticeably higher patch pressure in container ecosystems, a cluster has primarily become an operating system for platform teams: with clear responsibilities, reproducible processes, and traceable security. Those who self-host Kubernetes independently of hyperscalers gain transparency and control—but must assemble architecture, security baselines, and automation in such a way that the setup remains stable even under real-world conditions.

Business relevance
Kubernetes is relevant for companies when speed and standardization need to come together: deployments become repeatable, environments comparable, and changes can be brought into production via defined paths. This reduces friction – provided that operation and governance are part of the design.
This is particularly important at present because supply chain and dependency risks have become more visible in the container stack: images, registries, charts, operators, and CI/CD are real attack and failure surfaces, not just “dev topics.” In regulated environments (depending on the industry, e.g., NIS2-oriented programs or internal audit requirements), it is also important that policies, role models, and evidence work consistently across clusters and namespaces.
The benefits are then very concrete: fewer special cases per team, faster recovery after disruptions, cleaner cost and capacity planning, and better separation between platform and application. Kubernetes is thus less of a “technical decision” and more of an operating model.
Operating model & ownership

Who decides on cluster standards, who operates the control plane, and how are changes approved? In practice, a clear RACI helps, because otherwise Kubernetes quickly belongs to “everyone and no one” – especially when teams change or several products are onboarded in parallel.
Update & Security Capability

How are cluster upgrades, CNI/CSI updates, certificate rotation, and policy changes handled? Since PSA/pod security standards, stricter defaults, and shorter release cycles, upgrade capability has become a core criterion rather than a maintenance issue.
Integration, Data & Lifecycle

How are identity (OIDC/LDAP), logging/monitoring, backup/DR, and storage integrated for stateful workloads? Stateful is now commonplace—and the choice of CSI/backup mechanics determines whether recovery is a plan or a hope.

Trainings
Specific trainings and current topics can be found in the Comelio GmbH course catalog.
Whether in-house at your company, as a webinar, or as an open event – the formats are flexibly tailored to different requirements.
Typical misunderstandings
“Kubernetes automatically brings stability”
Stability is not created by the scheduler, but by SLOs, capacity limits, clean health checks, clear quotas, and an incident-ready observability setup. In practice, it often fails because monitoring remains just a “nice to have” – until the first resource bottleneck or CNI problem escalates.
“Security is an add-on that can be added later”
With today’s ransomware and supply chain patterns, ‘later’ is an expensive time. RBAC, network policies, admission controls, and secret handling must be defined early on because they define the team interfaces. Otherwise, subsequent “tightening” breaks workloads or leads to shadow IT.
“Multi-tenancy is just a namespace pattern.”
Namespaces are a start, but without isolation (policies, quotas, separate ingress/egress paths, separate node pools if necessary), multi-tenancy remains just a label. Especially as platform teams grow, this becomes an operational risk issue—not a matter of style.
“The installer is the main decision.”
Whether kubeadm, Kubespray, or a managed offering: the lifecycle model is crucial. If you don’t define upgrades, certificate lifetimes, backup/restore, and node replacement as a standardized process, you’re building a fragile platform in the long run.
Initial consultation / project initiation
If you have a specific project in mind (new build, hardening, GitOps introduction, multi-tenancy, storage/backup for stateful workloads, or operating concepts for self-hosting), an initial consultation can quickly clarify which operational and architectural decisions have the greatest leverage.
Frequently asked questions about Kubernetes
In this FAQ, you will find the topics that come up most frequently in consulting and training. Each answer is kept short and refers to further content if necessary. Can’t find your question? We are happy to help you personally.

kubeadm or Kubespray – which is suitable for bare metal and VMs?
kubeadm is suitable if you want to stay “close to the system” and design the automation yourself. Kubespray comes into its own when reproducibility, idempotence, and a standardized lifecycle model are more important – especially with multiple clusters or changing teams.
Calico, Cilium, or Flannel – how do I choose?
Flannel is often sufficient for simple environments and labs, but has less depth in terms of policies. Calico is established in many productive setups, especially when network policies are central. Cilium is a good fit if you want eBPF-based observability and fine-grained traffic control – but this also increases the demand for operational expertise.
How do I achieve clean and supportable multi-tenancy?
Namespace design is just the start. Multi-tenancy becomes viable through a combination of RBAC, network policies, quotas/limit ranges, clear ingress/egress rules, and a defined platform contract (templates, defaults, support limits). In practice, it is worth treating this as a “product” of the platform.
What are the most common causes of unstable clusters?
Often, it’s capacity and resource problems (requests/limits are missing or unrealistic), unclear responsibilities for add-ons (CNI/CSI/Ingress), and a lack of standard processes for upgrades and certificates. Too many extras too early (mesh, policy engines, operators) can also compromise stability if operations and observability don’t grow with them.
