[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bD3D_XFu1E60qBYwdDzK0c7_bN0BkGBE7h6h_sxmmfvAQ@mail.gmail.com>
Date: Fri, 19 Sep 2025 09:14:12 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: linux-kernel@...r.kernel.org, Cong Wang <cwang@...tikernel.io>,
Andrew Morton <akpm@...ux-foundation.org>, Baoquan He <bhe@...hat.com>,
Alexander Graf <graf@...zon.com>, Mike Rapoport <rppt@...nel.org>, Changyuan Lyu <changyuanl@...gle.com>,
kexec@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
On Thu, Sep 18, 2025 at 6:26 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> This patch series introduces multikernel architecture support, enabling
> multiple independent kernel instances to coexist and communicate on a
> single physical machine. Each kernel instance can run on dedicated CPU
> cores while sharing the underlying hardware resources.
>
> The multikernel architecture provides several key benefits:
> - Improved fault isolation between different workloads
> - Enhanced security through kernel-level separation
> - Better resource utilization than traditional VM (KVM, Xen etc.)
> - Potential zero-down kernel update with KHO (Kernel Hand Over)
Hi Cong,
Thank you for submitting this; it is an exciting series.
I experimented with this approach about five years ago for a Live
Update scenario. It required surprisingly little work to get two OSes
to boot simultaneously on the same x86 hardware. The procedure I
followed looked like this:
1. Create an immutable kernel image bundle: kernel + initramfs.
2. The first kernel is booted with memmap parameters, setting aside
the first 1G for its own operation, the second 1G for the next kernel
(reserved), and the rest as PMEM for the VMs.
3. In the first kernel, we offline one CPU and kexec the second kernel
with parameters that specify to use only the offlined CPU as the boot
CPU and to keep the other CPUs offline (i.e., smp_init does not start
other CPUs). The memmap specify the first 1G reserved, and the 2nd 1G
for its own operations, and the rest is PMEM.
4. Passing the VMs worked by suspending them in the old kernel.
5. The other CPUs are onlined in the new kernel (thus killing the old kernel).
6. The VMs are resumed in the new kernel.
While this approach was easy to get to the experimental PoC, it has
some fundamental problems that I am not sure can be solved in the long
run, such as handling global machine states like interrupts. I think
the Orphaned VM approach (i.e., keeping VCPUs running through the Live
Update procedure) is more reliable and likely to succeed for
zero-downtime kernel updates.
Pasha
>
> Architecture Overview:
> The implementation leverages kexec infrastructure to load and manage
> multiple kernel images, with each kernel instance assigned to specific
> CPU cores. Inter-kernel communication is facilitated through a dedicated
> IPI framework that allows kernels to coordinate and share information
> when necessary.
>
> Key Components:
> 1. Enhanced kexec subsystem with dynamic kimage tracking
> 2. Generic IPI communication framework for inter-kernel messaging
> 3. Architecture-specific CPU bootstrap mechanisms (only x86 so far)
> 4. Proc interface for monitoring loaded kernel instances
>
> Patch Summary:
>
> Patch 1/7: Introduces basic multikernel support via kexec, allowing
> multiple kernel images to be loaded simultaneously.
>
> Patch 2/7: Adds x86-specific SMP INIT trampoline for bootstrapping
> CPUs with different kernel instances.
>
> Patch 3/7: Introduces dedicated MULTIKERNEL_VECTOR for x86 inter-kernel
> communication.
>
> Patch 4/7: Implements generic multikernel IPI communication framework
> for cross-kernel messaging and coordination.
>
> Patch 5/7: Adds arch_cpu_physical_id() function to obtain physical CPU
> identifiers for proper CPU management.
>
> Patch 6/7: Replaces static kimage globals with dynamic linked list
> infrastructure to support multiple kernel images.
>
> Patch 7/7: Adds /proc/multikernel interface for monitoring and debugging
> loaded kernel instances.
>
> The implementation maintains full backward compatibility with existing
> kexec functionality while adding the new multikernel capabilities.
>
> IMPORTANT NOTES:
>
> 1) This is a Request for Comments (RFC) submission. While the core
> architecture is functional, there are numerous implementation details
> that need improvement. The primary goal is to gather feedback on the
> high-level design and overall approach rather than focus on specific
> coding details at this stage.
>
> 2) This patch series represents only the foundational framework for
> multikernel support. It establishes the basic infrastructure and
> communication mechanisms. We welcome the community to build upon
> this foundation and develop their own solutions based on this
> framework.
>
> 3) Testing has been limited to the author's development machine using
> hard-coded boot parameters and specific hardware configurations.
> Community testing across different hardware platforms, configurations,
> and use cases would be greatly appreciated to identify potential
> issues and improve robustness. Obviously, don't use this code beyond
> testing.
>
> This work enables new use cases such as running real-time kernels
> alongside general-purpose kernels, isolating security-critical
> applications, and providing dedicated kernel instances for specific
> workloads etc..
>
> Signed-off-by: Cong Wang <cwang@...tikernel.io>
>
> ---
>
> Cong Wang (7):
> kexec: Introduce multikernel support via kexec
> x86: Introduce SMP INIT trampoline for multikernel CPU bootstrap
> x86: Introduce MULTIKERNEL_VECTOR for inter-kernel communication
> kernel: Introduce generic multikernel IPI communication framework
> x86: Introduce arch_cpu_physical_id() to obtain physical CPU ID
> kexec: Implement dynamic kimage tracking
> kexec: Add /proc/multikernel interface for kimage tracking
>
> arch/powerpc/kexec/crash.c | 8 +-
> arch/x86/include/asm/idtentry.h | 1 +
> arch/x86/include/asm/irq_vectors.h | 1 +
> arch/x86/include/asm/smp.h | 7 +
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/crash.c | 4 +-
> arch/x86/kernel/head64.c | 5 +
> arch/x86/kernel/idt.c | 1 +
> arch/x86/kernel/setup.c | 3 +
> arch/x86/kernel/smp.c | 15 ++
> arch/x86/kernel/smpboot.c | 161 +++++++++++++
> arch/x86/kernel/trampoline_64_bsp.S | 288 ++++++++++++++++++++++
> arch/x86/kernel/vmlinux.lds.S | 6 +
> include/linux/kexec.h | 22 +-
> include/linux/multikernel.h | 81 +++++++
> include/uapi/linux/kexec.h | 1 +
> include/uapi/linux/reboot.h | 2 +-
> init/main.c | 2 +
> kernel/Makefile | 2 +-
> kernel/kexec.c | 103 +++++++-
> kernel/kexec_core.c | 359 ++++++++++++++++++++++++++++
> kernel/kexec_file.c | 33 ++-
> kernel/multikernel.c | 314 ++++++++++++++++++++++++
> kernel/reboot.c | 10 +
> 24 files changed, 1411 insertions(+), 19 deletions(-)
> create mode 100644 arch/x86/kernel/trampoline_64_bsp.S
> create mode 100644 include/linux/multikernel.h
> create mode 100644 kernel/multikernel.c
>
> --
> 2.34.1
>
Powered by blists - more mailing lists