lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpU9DDg2Oi33_dfPqVpd9j_2O+WD7ovo__f48BA9DztwXQ@mail.gmail.com>
Date: Sat, 8 Feb 2025 15:39:14 -0800
From: Cong Wang <xiyou.wangcong@...il.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: linux-kernel@...r.kernel.org, Alexander Graf <graf@...zon.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Andy Lutomirski <luto@...nel.org>, 
	Anthony Yznaga <anthony.yznaga@...cle.com>, Arnd Bergmann <arnd@...db.de>, 
	Ashish Kalra <ashish.kalra@....com>, Benjamin Herrenschmidt <benh@...nel.crashing.org>, 
	Borislav Petkov <bp@...en8.de>, Catalin Marinas <catalin.marinas@....com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, David Woodhouse <dwmw2@...radead.org>, 
	Eric Biederman <ebiederm@...ssion.com>, Ingo Molnar <mingo@...hat.com>, 
	James Gowans <jgowans@...zon.com>, Jonathan Corbet <corbet@....net>, 
	Krzysztof Kozlowski <krzk@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Paolo Bonzini <pbonzini@...hat.com>, Pasha Tatashin <pasha.tatashin@...een.com>, 
	"H. Peter Anvin" <hpa@...or.com>, Peter Zijlstra <peterz@...radead.org>, Pratyush Yadav <ptyadav@...zon.de>, 
	Rob Herring <robh+dt@...nel.org>, Rob Herring <robh@...nel.org>, 
	Saravana Kannan <saravanak@...gle.com>, 
	Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Tom Lendacky <thomas.lendacky@....com>, 
	Usama Arif <usama.arif@...edance.com>, Will Deacon <will@...nel.org>, devicetree@...r.kernel.org, 
	kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org, 
	linux-doc@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO)

Hi Mike,

On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@...nel.org> wrote:
>
> From: "Mike Rapoport (Microsoft)" <rppt@...nel.org>
>
> Hi,
>
> This a next version of Alex's "kexec: Allow preservation of ftrace buffers"
> series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com),
> just to make things simpler instead of ftrace we decided to preserve
> "reserve_mem" regions.
>
> The patches are also available in git:
> https://git.kernel.org/rppt/h/kho/v4
>
>
> Kexec today considers itself purely a boot loader: When we enter the new
> kernel, any state the previous kernel left behind is irrelevant and the
> new kernel reinitializes the system.
>
> However, there are use cases where this mode of operation is not what we
> actually want. In virtualization hosts for example, we want to use kexec
> to update the host kernel while virtual machine memory stays untouched.
> When we add device assignment to the mix, we also need to ensure that
> IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we
> need to do the same for the PCI subsystem. If we want to kexec while an
> SEV-SNP enabled virtual machine is running, we need to preserve the VM
> context pages and physical memory. See "pkernfs: Persisting guest memory
> and kernel/device state safely across kexec" Linux Plumbers
> Conference 2023 presentation for details:
>
>   https://lpc.events/event/17/contributions/1485/
>
> To start us on the journey to support all the use cases above, this patch
> implements basic infrastructure to allow hand over of kernel state across
> kexec (Kexec HandOver, aka KHO). As a really simple example target, we use
> memblock's reserve_mem.
> With this patch set applied, memory that was reserved using "reserve_mem"
> command line options remains intact after kexec and it is guaranteed to
> reside at the same physical address.

Nice work!

One concern there is that using memblock to reserve memory as crashkernel=
is not flexible. I worked on kdump years ago and one of the biggest pains
of kdump is how much memory should be reserved with crashkernel=. And
it is still a pain today.

If we reserve more, that would mean more waste for the 1st kernel. If we
reserve less, that would induce more OOM for the 2nd kernel.

I'd suggest considering using CMA, where the "reserved" memory can be
still reusable for other purposes, just that pages can be migrated out of this
reserved region on demand, that is, when loading a kexec kernel. Of course,
we need to make sure they are not reused by what you want to preserve here,
e.g., IOMMU. So you might need additional work to make it work, but still I
believe this is the right direction.

Just my two cents.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ