lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250210202220.GC3765641@nvidia.com>
Date: Mon, 10 Feb 2025 16:22:20 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: linux-kernel@...r.kernel.org, Alexander Graf <graf@...zon.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andy Lutomirski <luto@...nel.org>,
	Anthony Yznaga <anthony.yznaga@...cle.com>,
	Arnd Bergmann <arnd@...db.de>, Ashish Kalra <ashish.kalra@....com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Borislav Petkov <bp@...en8.de>,
	Catalin Marinas <catalin.marinas@....com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	David Woodhouse <dwmw2@...radead.org>,
	Eric Biederman <ebiederm@...ssion.com>,
	Ingo Molnar <mingo@...hat.com>, James Gowans <jgowans@...zon.com>,
	Jonathan Corbet <corbet@....net>,
	Krzysztof Kozlowski <krzk@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Pasha Tatashin <pasha.tatashin@...een.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Pratyush Yadav <ptyadav@...zon.de>,
	Rob Herring <robh+dt@...nel.org>, Rob Herring <robh@...nel.org>,
	Saravana Kannan <saravanak@...gle.com>,
	Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tom Lendacky <thomas.lendacky@....com>,
	Usama Arif <usama.arif@...edance.com>,
	Will Deacon <will@...nel.org>, devicetree@...r.kernel.org,
	kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
	linux-doc@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation
 helpers

On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote:
> diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho
> new file mode 100644
> index 000000000000..f13b252bc303
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-kho
> @@ -0,0 +1,53 @@
> +What:		/sys/kernel/kho/active
> +Date:		December 2023
> +Contact:	Alexander Graf <graf@...zon.com>
> +Description:
> +		Kexec HandOver (KHO) allows Linux to transition the state of
> +		compatible drivers into the next kexec'ed kernel. To do so,
> +		device drivers will serialize their current state into a DT.
> +		While the state is serialized, they are unable to perform
> +		any modifications to state that was serialized, such as
> +		handed over memory allocations.
> +
> +		When this file contains "1", the system is in the transition
> +		state. When contains "0", it is not. To switch between the
> +		two states, echo the respective number into this file.

I don't think this is a great interface for the actual state machine..

> +What:		/sys/kernel/kho/dt_max
> +Date:		December 2023
> +Contact:	Alexander Graf <graf@...zon.com>
> +Description:
> +		KHO needs to allocate a buffer for the DT that gets
> +		generated before it knows the final size. By default, it
> +		will allocate 10 MiB for it. You can write to this file
> +		to modify the size of that allocation.

Seems gross, why can't it use a non-contiguous page list to generate
the FDT? :\

See below for a suggestion..

> +static int kho_serialize(void)
> +{
> +	void *fdt = NULL;
> +	int err = -ENOMEM;
> +
> +	fdt = kvmalloc(kho_out.dt_max, GFP_KERNEL);
> +	if (!fdt)
> +		goto out;
> +
> +	if (fdt_create(fdt, kho_out.dt_max)) {
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	err = fdt_finish_reservemap(fdt);
> +	if (err)
> +		goto out;
> +
> +	err = fdt_begin_node(fdt, "");
> +	if (err)
> +		goto out;
> +
> +	err = fdt_property_string(fdt, "compatible", "kho-v1");
> +	if (err)
> +		goto out;
> +
> +	/* Loop through all kho dump functions */
> +	err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_DUMP, fdt);
> +	err = notifier_to_errno(err);

I don't see this really working long term. I think we'd like each
component to be able to serialize at its own pace under userspace
control.

This design requires that the whole thing be wrapped in a notifier
callback just so we can make use of the fdt APIs.

It seems like a poor fit me.

IMHO if you want to keep using FDT I suggest that each serializing
component (ie driver, ftrace whatever) allocate its own FDT fragment
from scratch and the main KHO one just link to the memories that holds
those fragements.

Ie the driver experience would be more like

 kho = kho_start_storage("my_compatible_string,v1", some_kind_of_instance_key);

 fdt...(kho->fdt..)

 kho_finish_storage(kho);

Where this ends up creating a stand alone FDT fragment:

/dts-v1/;
/ {
  compatible = "linux-kho,my_compatible_string,v1";
  instance = some_kind_of_instance_key;
  key-value-1 = <..>;
  key-value-1 = <..>;
};

And then kho_finish_storage() would remember the phys/length until the
kexec fdt is produced as the very last step.

This way we could do things like fdbox an iommufd and create the above
FDT fragment completely seperately from any notifier chain and,
crucially, disconnected from the fdt_create() for the kexec payload.

Further, if you split things like this (it will waste some small
amount of memory) you can probably get to a point where no single FDT
is more than 4k. That looks like it would simplify/robustify alot of
stuff?

Jason


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ