linux-kernel - Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpXnHr7WC6VN3WB-+=CZGF5pyfo9y9D4MCc_Wwgp29hBrw@mail.gmail.com>
Date: Sat, 20 Sep 2025 14:40:18 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Stefan Hajnoczi <stefanha@...hat.com>
Cc: linux-kernel@...r.kernel.org, pasha.tatashin@...een.com, 
	Cong Wang <cwang@...tikernel.io>, Andrew Morton <akpm@...ux-foundation.org>, 
	Baoquan He <bhe@...hat.com>, Alexander Graf <graf@...zon.com>, Mike Rapoport <rppt@...nel.org>, 
	Changyuan Lyu <changyuanl@...gle.com>, kexec@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support

On Fri, Sep 19, 2025 at 2:27 PM Stefan Hajnoczi <stefanha@...hat.com> wrote:
>
> On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > This patch series introduces multikernel architecture support, enabling
> > multiple independent kernel instances to coexist and communicate on a
> > single physical machine. Each kernel instance can run on dedicated CPU
> > cores while sharing the underlying hardware resources.
> >
> > The multikernel architecture provides several key benefits:
> > - Improved fault isolation between different workloads
> > - Enhanced security through kernel-level separation
>
> What level of isolation does this patch series provide? What stops
> kernel A from accessing kernel B's memory pages, sending interrupts to
> its CPUs, etc?

It is kernel-enforced isolation, therefore, the trust model here is still
based on kernel. Hence, a malicious kernel would be able to disrupt,
as you described. With memory encryption and IPI filtering, I think
that is solvable.

>
> > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> >
> > Architecture Overview:
> > The implementation leverages kexec infrastructure to load and manage
> > multiple kernel images, with each kernel instance assigned to specific
> > CPU cores. Inter-kernel communication is facilitated through a dedicated
> > IPI framework that allows kernels to coordinate and share information
> > when necessary.
> >
> > Key Components:
> > 1. Enhanced kexec subsystem with dynamic kimage tracking
> > 2. Generic IPI communication framework for inter-kernel messaging
> > 3. Architecture-specific CPU bootstrap mechanisms (only x86 so far)
> > 4. Proc interface for monitoring loaded kernel instances
> >
> > Patch Summary:
> >
> > Patch 1/7: Introduces basic multikernel support via kexec, allowing
> >            multiple kernel images to be loaded simultaneously.
> >
> > Patch 2/7: Adds x86-specific SMP INIT trampoline for bootstrapping
> >            CPUs with different kernel instances.
> >
> > Patch 3/7: Introduces dedicated MULTIKERNEL_VECTOR for x86 inter-kernel
> >            communication.
> >
> > Patch 4/7: Implements generic multikernel IPI communication framework
> >            for cross-kernel messaging and coordination.
> >
> > Patch 5/7: Adds arch_cpu_physical_id() function to obtain physical CPU
> >            identifiers for proper CPU management.
> >
> > Patch 6/7: Replaces static kimage globals with dynamic linked list
> >            infrastructure to support multiple kernel images.
> >
> > Patch 7/7: Adds /proc/multikernel interface for monitoring and debugging
> >            loaded kernel instances.
> >
> > The implementation maintains full backward compatibility with existing
> > kexec functionality while adding the new multikernel capabilities.
> >
> > IMPORTANT NOTES:
> >
> > 1) This is a Request for Comments (RFC) submission. While the core
> >    architecture is functional, there are numerous implementation details
> >    that need improvement. The primary goal is to gather feedback on the
> >    high-level design and overall approach rather than focus on specific
> >    coding details at this stage.
> >
> > 2) This patch series represents only the foundational framework for
> >    multikernel support. It establishes the basic infrastructure and
> >    communication mechanisms. We welcome the community to build upon
> >    this foundation and develop their own solutions based on this
> >    framework.
> >
> > 3) Testing has been limited to the author's development machine using
> >    hard-coded boot parameters and specific hardware configurations.
> >    Community testing across different hardware platforms, configurations,
> >    and use cases would be greatly appreciated to identify potential
> >    issues and improve robustness. Obviously, don't use this code beyond
> >    testing.
> >
> > This work enables new use cases such as running real-time kernels
> > alongside general-purpose kernels, isolating security-critical
> > applications, and providing dedicated kernel instances for specific
> > workloads etc..
>
> This reminds me of Jailhouse, a partitioning hypervisor for Linux.
> Jailhouse uses virtualization and other techniques to isolate CPUs,
> allowing real-time workloads to run alongside Linux:
> https://github.com/siemens/jailhouse
>
> It would be interesting to hear your thoughts about where you want to go
> with this series and how it compares with a partitioning hypervisor like
> Jailhouse.

Good question. A few people pointed me to Jailhouse before. If I understand
correctly, it is still based on hardware virtualization like IOMMU and VMX.
The goal of multikernel is to completely avoid hw virtualization
and without a hypervisor. Of course, this also depends on how we define
hypervisor here: If it is a user-space one like Qemu, this is exactly what
multikernel tries to avoid; or if it is just a broadly "supervisor",
it still exists
in the kernel (unlike Qemu).

This is why I tend to use "host kernel" and "spawned kernel" to distinguish
them, instead of using "hypervisor" and "guest", which easily confuse
people with  virtualization.

Speaking of virtualization, there are some other technologies like
DirectVisor or De-virt. In my humble opinion, they are going the wrong
way as apparently virt + de-virt = no virt. Why even bother virt? ;-p

I hope this answers your questions,

Regards,
Cong