lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250210192626.GB3765641@nvidia.com>
Date: Mon, 10 Feb 2025 15:26:26 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: linux-kernel@...r.kernel.org, Alexander Graf <graf@...zon.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andy Lutomirski <luto@...nel.org>,
	Anthony Yznaga <anthony.yznaga@...cle.com>,
	Arnd Bergmann <arnd@...db.de>, Ashish Kalra <ashish.kalra@....com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Borislav Petkov <bp@...en8.de>,
	Catalin Marinas <catalin.marinas@....com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	David Woodhouse <dwmw2@...radead.org>,
	Eric Biederman <ebiederm@...ssion.com>,
	Ingo Molnar <mingo@...hat.com>, James Gowans <jgowans@...zon.com>,
	Jonathan Corbet <corbet@....net>,
	Krzysztof Kozlowski <krzk@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Pasha Tatashin <pasha.tatashin@...een.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Pratyush Yadav <ptyadav@...zon.de>,
	Rob Herring <robh+dt@...nel.org>, Rob Herring <robh@...nel.org>,
	Saravana Kannan <saravanak@...gle.com>,
	Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tom Lendacky <thomas.lendacky@....com>,
	Usama Arif <usama.arif@...edance.com>,
	Will Deacon <will@...nel.org>, devicetree@...r.kernel.org,
	kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
	linux-doc@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v4 09/14] kexec: Add documentation for KHO

On Thu, Feb 06, 2025 at 03:27:49PM +0200, Mike Rapoport wrote:

> +KHO introduces a new concept to its device tree: ``mem`` properties. A
> +``mem`` property can be inside any subnode in the device tree. 

I do not think this is a good idea.

It should be core infrastructure, totally unrelated to any per-device
fdt nodes, to carry the memory map.

IOW a full DT that looks something more like:

/dts-v1/;

/ {
  compatible = "linux-kho,v1";
  allocated-memory {
        <>
  };

  ftracebuffer {
  	       compatible = "linux-kho,ftracem,v1";
	       ftrace-buffer-phys = <..>;
	       ftrace-buffer-len = <..>;
	       ..etc..
  };
};

Where allocated_memory will remove all memory from the buddy allocator
very early on in an efficient way. that process should not be walking
the fdt to find mem nodes.

> +After boot, drivers can call the kho subsystem to transfer ownership of memory
> +that was reserved via a ``mem`` property to themselves to continue using memory
> +from the previous execution.

And this transfer should be done by phys that the node itself
describes.

Ie if ftrace has a single high order folio to store it's ftrace buffer
then I would expect code like:

allocate ftrace:
  buffer = folio_alloc(..);

activate callback:
   kho_preserve_folio(buffer)
   fdt...("ftrace-buffer-phys", virt_to_phys(buffer))

restore callback:
   buffer_phys = fdt..("ftrace-buffer-phys")
   buffer = kho_restore_folio(buffer_phys)
   [..]

destroy ftrace:
   folio_put(buffer);

And kho will take care to restore the struct folio, put back the
order, etc, etc.

Similar for slab.

I think this sort of memory-based operation should be the very basic
core building primitive here.

So the allocated-memory node should preserve information about KHO'd
folios, their order and so on.

It doesn't matter what part of the FDT owns those folios, all the core
kernel should do is keep track of them and at some point check that
all preserved folios have been claimed.

> +We guarantee that we always have such regions through the scratch regions: On
> +first boot KHO allocates several physically contiguous memory regions. Since
> +after kexec these regions will be used by early memory allocations, there is a
> +scratch region per NUMA node plus a scratch region to satisfy allocations
> +requests that do not require particilar NUMA node assignment.

This plan sounds great, way better than the pmem approaches/etc.

> +To enable user space based kexec file loader, the kernel needs to be able to
> +provide the device tree that describes the previous kernel's state before
> +performing the actual kexec. The process of generating that device tree is
> +called serialization. When the device tree is generated, some properties
> +of the system may become immutable because they are already written down
> +in the device tree. That state is called the KHO active phase.

This should have a whole state diagram as we've talked a few
times. There is alot more to worry about here than just 'activate'.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ