[0]: https://tadeubento.com/2024/replace-proxmox-with-incus-lxd/
If you are interested in running kubernetes on top of incus, that is your kubernetes cluster nodes will be made up of KVM or LXC instances - I highly recommend the cluster-api provider incus https://github.com/lxc/cluster-api-provider-incus
This provider is really well done and maintained, including ClusterClass support and array of pre-built machine images for both KVM and LXC. It also supports pivoting the mgmt cluster on to a workload cluster, enabling the mgmt cluster to upgrade itself which is really cool.
I was surprised to come across this provider by chance as for some reason it's not listed on the CAPI documentation provider list https://cluster-api.sigs.k8s.io/reference/providers
Thanks for the good comments! Indeed, adding to the list of CAPI providers is on the roadmap (did not want to do it before discussing with Stephan to move the project under the LXC org, but that is now complete). Also, I'm working on a few other niceties, like a "kind"-style script that would allow easily managing small k8s clusters without the full CAPI requirements (while, at the same time, documenting all it takes to run Kubernetes under LXC in general).
You can expect more things about the project, and any feedback would be welcome!
I have actually learned quite a bit just reading your gitbook and workflows.
This provider is also great because it sits in the space of fully on-prem and fully self-hosted. Kubevirt is also here but it needs an additional provider to be able to fully pivot and manage itself.
I'm quite interested in your machine image pipeline and how you publish them on simplestreams. I'm working with MaaS and really want to implement the same pattern you have, pushing to a central location and let MaaS sync. It's very painful needing to manually import the images beforehand and handle garbage collection.
Would your Incus and KVM images work with MaaS as well? If there is a better approach I am all ears.
Thanks again for sharing your fantastic work with the community.
I'm sure you could probably manually hook up Incus to something like Consul, but it would be more effort than it's worth.
If you mean service discovery as in being able to use the k8s API (ex. to find service manually) from inside your application, then incus allows for that too..
Maybe the key difference you're pointing at here is that Incus does not give you one huge network (where by default everything is routable), so you have to set up your own bridge networks:
https://linuxcontainers.org/incus/docs/main/explanation/netw...
https://linuxcontainers.org/incus/docs/main/reference/networ...
I could definitely agree here that Incus is more unencumbered in this respect
Supports, as in "good luck with your perl scripts" or, as Kubernetes does, automatically updates dns with the A record(s) and SRV records of the constituent hosts? Because I didn't see anything in the docs about DNS support and I don't know systemd well enough to know what problem this is solving <https://linuxcontainers.org/incus/docs/main/howto/network_br...>
Incus is not bridged by default, so you have to do more work to get to that starting point (IP addresses), there's some configuration for IPAM as well.
Incus also does not provide name resolution support out of the box, contrasted with kubernetes which will modify resolution rules via the Kubelet. Incus can do this via systemd i.e. at the system level for traffic into a specific Incus node.
> If the system that runs Incus uses systemd-resolved to perform DNS lookups, you should notify resolved of the domains that Incus can resolve.
This, combined with BGP[0] would give you a mesh-like system.
So basically, Incus definitely doesn't do it out of the box -- you need to do your own plumbing.
To be clear, I stand corrected here -- this is a legitimate difference between the two, but it's not that it's impossible/completely out of scope with Incus, it's just that it's not built in.
[0]: https://linuxcontainers.org/incus/docs/main/howto/network_bg...
https://linuxcontainers.org/incus/docs/main/explanation/clus...
You may have a point there that k8s is not meant for single machines but that’s not a hard rule, more like a “why would you want to” you can absolutely run single node Kubernetes.
Also strictly speaking incus is not a container nor vm runtime, it’s an orchestrator of those things.
The full system, means, all your standard stuff works in expected way - crons, systemd units, sshd, multiple users, your regular backup solutions and so on. As well, that "system" can be dumped/exported/backuped/snapshotted as a whole - very much similar like you do with your vSphere/Qemu or whatever you use in your datacenters as hypervisor.
Foreseeing the question - yep, you can run Docker inside LXD/Incus VEs. In practical terms, that makes much simple when you need to give some dev team (who are of course not anywhere known to do sane things) access to environment with Docker access (which in 99% cases means that host's root level access is exposed).
LXC containers used in incus run their own init, they act more like a VM.
However incus can also execute actual VMs via libvirt and since recently even OCI containers like docker.
Is this for people who want to run their own cloud provider, or that need to manage the infrastructure of org-owned VM's?
When would you use this over k8s or serverless container runtimes like Fargate/Cloudrun?
Use cases are almost the same as Proxmox. You can orchestrate system containers or VMs. Proxmox runs lxc container images, while incus is built on top of lxc and has its own container images.
System vs Application containers: Both share virtualized kernels. Application containers usually run only a single application like web app (eg: OCI containers). System containers are more like VMs with systemd and multiple apps managed by it. Note: This differentiation is often ambiguous.
> Is this for people who want to run their own cloud provider, or that need to manage the infrastructure of org-owned VM's?
Yes, you could build a private cloud with it.
> When would you use this over k8s or serverless container runtimes like Fargate/Cloudrun?
You would use it when you need traditional application setups inside VMs or system containers, instead of containerized microservices.
I actually use Incus containers as host nodes for testing full fledged multinode K8s setups.
For others, why it may be useful in regular sysadmin job:
* say doing Ansible scripting against LOCAL network is hell amount of time faster than against 300+ ms remote machines
* note that because you can use VE snapshots, it's very easy to ensure your playbook works fine without guessing what have you modified when testing things - just do rollback to "clean" state and start over
* creating test MariaDB 3 nodes cluster - easy peasy* multiple distros available - need to debug Haproxy from say Rocky 8 linux? Check!
But Firecracker is fundamentally different because it has a different purpose: Firecracker is about offering VM-based isolation for systems that have container-like ephemerality in multitenant environments, especially the cloud. So when you use Firecracker, each system has its own kernel running under its own paravirtualized hardware.
With OrbStack and WSL, you have only one kernel for all of your "guests" (which are container guests, rather tha hardware paravirtualized guests). In exchange you're working with something that's simpler in some ways, more efficient, has less resource contention, etc. And it's easier to share resources between containers dynamically than across VMs, so it's very easy to run 10 "machines" but only allocate 4GB of RAM or whatever, and have it shared freely between them with little overhead. They can also share Unix sockets (like the socket for Docker or a Kubernetes runtime) directly as files, since they share a kernel-- no need for some kind of HTTP-based socket forwarding across virtualized network devices.
I imagine this is nice for many use cases, but as you can imagine, it's especially nice for local development. :)
I personally use it mostly for deploying a bunch of system containers and some OCI containers.
But anyone who uses LXC, LXD, docker, libvirt, qemu etc. could potentially be interested in Incus.
Incus is just an LXD fork btw, developed by Stephane Graber.
What about kernel updates that require reboots? I have heard of ksplice/kexec, but I have never seen them used anywhere.
To some extent, of course things like vSphere/Virtuozzo and even LXD/Incus, and even simple Qemu/Virsh systems can do live migration of VMs, so you may care less on preparing things inside VMs to be fault taulerant, but to some extent.
I.e. if your team do run PostgreSQL, your run it in cluster with Patroni and VIPs and all that lovely industry standard magic and tell dev teams to use that VIP as entry point (in reality things bit more complicated with Haproxy/Pgbouncer on top, but enough to express the idea).
P.S. Microcloud tries to achieve this AFAIR, but it's from Canonical, so on LXD.
How does this look with Incus? Obviously if the workload you are running has some kind of multinode support you can use that, but I'm wondering if Incus a way to do this in some kind of generalized way like k8s?
But I did some more reading, there seems to be support for live migration for VMs, and limited live migration for containers. Moving stopped instances is supported for both VMs and containers.
[0] https://linuxcontainers.org/incus/docs/main/howto/cluster_ma...
[1] https://linuxcontainers.org/incus/docs/main/howto/move_insta...
Indeed, container live migration is limited and a bit unclear on "network devices" - bridged interface is network device or not?
Bit ironic, that even with using CRUI, which AFAIK was created by the same Virtuozzo guys which provided OpenVZ back then, and that VEs could live migrate, was personally testing it in 2007-2008. Granted, there we no systemd by that days, if this complicates things. And of course required their patched kernel.
> Live migration for containers For containers, there is limited support for live migration using CRIU. However, because of extensive kernel dependencies, only very basic containers (non-systemd containers without a network device) can be migrated reliably. In most real-world scenarios, you should stop the container, move it over and then start it again.
There are a couple of differences though. The first is the pet vs cattle treatment of containers by Incus and k8s respectively. Incus tries to resurrect dead containers as faithfully as possible. This means that Incus treats container crashes like system crashes, and its recovery involves systemd bootup inside the container (kernel too in case of VMs). This is what accounts for the delay. K8s on the other hand, doesn't care about dead containers/pods at all. It just creates another pod, likely with a different address and expects it to handle the interruption.
Another difference is the orchestration mechanism behind this. K8s, as you may be aware, uses control loops on controller nodes to detect the crash and initiate the recovery. The recovery is mediated by the kubelets on the worker nodes. Incus seems to have the orchestrator on all nodes. They take decisions based on consensus and manage the recovery process themselves.
[1] https://linuxcontainers.org/incus/docs/main/howto/cluster_ma...
That's not true of Pods; each Pod has its own distinct network identity. You're correct about the network, though, since AFAIK Service and Pod CIDR are fixed for the lifespan of the k8s cluster
You spoke to it further down, but guarded it with "likely" and I can say with certainty that it's not likely, it unconditionally does. That's not to say address re-use isn't possible over a long enough time horizon, but that bookkeeeping is delegated to the CNI
---
Your "dead container" one also has some nuance, in that kubelet will for sure restart a failed container, in place, with the same network identity. When fresh identity comes into play is if the Node fails, or the control loop determines something in the Pod's configuration has changed (env-vars, resources, scheduling constraints, etc) in which case it will be recreated, even if by coincidence on the same Node
You are 100% wrong then. The kube-ovn CNI enables static address assignment and "sticky" IPAM on both pods and kubevirt vms.
https://kubeovn.github.io/docs/v1.12.x/en/guide/static-ip-ma...
Sorry for the lapse and I'll try to be more careful when using "unconditional" to describe pluggable software
I personally use lxd for running my homelab VMs and containers
In case you are interested, Zabbly has some interesting behind-the-scenes on Youtube (not affiliated).
The YT description also points to https://github.com/zabbly/incus
Incus is not a replacement for lxc. It's an alternative for LXD (LXD project is still active). Both Incus and LXD are built upon liblxc (the library version of lxc) and provide a higher level user interface than lxc (eg: projects, cloud-init support, etc). However, lxc gives you fine grained control over container options (this is sort of like flatpak and bubblewrap).
So, if you don't need the fine grained control of lxc, Incus may be a more ergonomic solution.
PS: Confusingly enough, LXD's CLI is also named lxc.
I'm a Pulumi user myself, and I haven't seen a Pulumi provider for Incus yet. Once I get further into my Incus experiments, if someone hasn't made an Incus provider yet, I'll probably go through the TF provider conversion process.
I just tried and it seems to have worked (though I haven't tested any specific resources yet):
$ pulumi package add terraform-provider lxc/incus
Meanwhile, running Incus inside GCP VM(s) should be possible, though I haven't tried it and can't confirm it. Incus can manage system containers - containers that behave like VMs running full distros, except for the kernel.
But keep in mind that Incus is more like docker than docker-compose. You will need a different tool to provision and configure all those containers over Incus's API (docker-compose does this for application containers over the docker API). As mentioned before, that could be Terraform/OpenTofu, cloud-init and Ansible. Or you could even write a script to do it over the API. I have done this using Python.