lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+CK2bBBX+HgD0HLj-AyTScM59F2wXq11BEPgejPMHoEwqj+_Q@mail.gmail.com>
Date: Mon, 10 Feb 2025 15:58:00 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Mike Rapoport <rppt@...nel.org>, linux-kernel@...r.kernel.org, 
	Alexander Graf <graf@...zon.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Andy Lutomirski <luto@...nel.org>, Anthony Yznaga <anthony.yznaga@...cle.com>, 
	Arnd Bergmann <arnd@...db.de>, Ashish Kalra <ashish.kalra@....com>, 
	Benjamin Herrenschmidt <benh@...nel.crashing.org>, Borislav Petkov <bp@...en8.de>, 
	Catalin Marinas <catalin.marinas@....com>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	David Woodhouse <dwmw2@...radead.org>, Eric Biederman <ebiederm@...ssion.com>, 
	Ingo Molnar <mingo@...hat.com>, James Gowans <jgowans@...zon.com>, Jonathan Corbet <corbet@....net>, 
	Krzysztof Kozlowski <krzk@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Paolo Bonzini <pbonzini@...hat.com>, "H. Peter Anvin" <hpa@...or.com>, 
	Peter Zijlstra <peterz@...radead.org>, Pratyush Yadav <ptyadav@...zon.de>, 
	Rob Herring <robh+dt@...nel.org>, Rob Herring <robh@...nel.org>, 
	Saravana Kannan <saravanak@...gle.com>, 
	Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Tom Lendacky <thomas.lendacky@....com>, 
	Usama Arif <usama.arif@...edance.com>, Will Deacon <will@...nel.org>, devicetree@...r.kernel.org, 
	kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org, 
	linux-doc@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers

On Mon, Feb 10, 2025 at 3:22 PM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote:
> > diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho
> > new file mode 100644
> > index 000000000000..f13b252bc303
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-kernel-kho
> > @@ -0,0 +1,53 @@
> > +What:                /sys/kernel/kho/active
> > +Date:                December 2023
> > +Contact:     Alexander Graf <graf@...zon.com>
> > +Description:
> > +             Kexec HandOver (KHO) allows Linux to transition the state of
> > +             compatible drivers into the next kexec'ed kernel. To do so,
> > +             device drivers will serialize their current state into a DT.
> > +             While the state is serialized, they are unable to perform
> > +             any modifications to state that was serialized, such as
> > +             handed over memory allocations.
> > +
> > +             When this file contains "1", the system is in the transition
> > +             state. When contains "0", it is not. To switch between the
> > +             two states, echo the respective number into this file.
>
> I don't think this is a great interface for the actual state machine..

In our next proposal we are going to remove this "activate" phase.

>
> > +What:                /sys/kernel/kho/dt_max
> > +Date:                December 2023
> > +Contact:     Alexander Graf <graf@...zon.com>
> > +Description:
> > +             KHO needs to allocate a buffer for the DT that gets
> > +             generated before it knows the final size. By default, it
> > +             will allocate 10 MiB for it. You can write to this file
> > +             to modify the size of that allocation.
>
> Seems gross, why can't it use a non-contiguous page list to generate
> the FDT? :\

We will consider some of these ideas in the future version. I like the
idea of using preserved memory to carry sparse KHO tree: i.e FDT over
sparse memory, maybe use the anchor page to describe how it should be
vmapped into a virtually contiguous tree in the next kernel?

>
> See below for a suggestion..
>
> > +static int kho_serialize(void)
> > +{
> > +     void *fdt = NULL;
> > +     int err = -ENOMEM;
> > +
> > +     fdt = kvmalloc(kho_out.dt_max, GFP_KERNEL);
> > +     if (!fdt)
> > +             goto out;
> > +
> > +     if (fdt_create(fdt, kho_out.dt_max)) {
> > +             err = -EINVAL;
> > +             goto out;
> > +     }
> > +
> > +     err = fdt_finish_reservemap(fdt);
> > +     if (err)
> > +             goto out;
> > +
> > +     err = fdt_begin_node(fdt, "");
> > +     if (err)
> > +             goto out;
> > +
> > +     err = fdt_property_string(fdt, "compatible", "kho-v1");
> > +     if (err)
> > +             goto out;
> > +
> > +     /* Loop through all kho dump functions */
> > +     err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_DUMP, fdt);
> > +     err = notifier_to_errno(err);
>
> I don't see this really working long term. I think we'd like each
> component to be able to serialize at its own pace under userspace
> control.
>
> This design requires that the whole thing be wrapped in a notifier
> callback just so we can make use of the fdt APIs.
>
> It seems like a poor fit me.
>
> IMHO if you want to keep using FDT I suggest that each serializing
> component (ie driver, ftrace whatever) allocate its own FDT fragment
> from scratch and the main KHO one just link to the memories that holds
> those fragements.
>
> Ie the driver experience would be more like
>
>  kho = kho_start_storage("my_compatible_string,v1", some_kind_of_instance_key);
>
>  fdt...(kho->fdt..)
>
>  kho_finish_storage(kho);
>
> Where this ends up creating a stand alone FDT fragment:
>
> /dts-v1/;
> / {
>   compatible = "linux-kho,my_compatible_string,v1";
>   instance = some_kind_of_instance_key;
>   key-value-1 = <..>;
>   key-value-1 = <..>;
> };
>
> And then kho_finish_storage() would remember the phys/length until the
> kexec fdt is produced as the very last step.
>
> This way we could do things like fdbox an iommufd and create the above
> FDT fragment completely seperately from any notifier chain and,
> crucially, disconnected from the fdt_create() for the kexec payload.
>
> Further, if you split things like this (it will waste some small
> amount of memory) you can probably get to a point where no single FDT
> is more than 4k. That looks like it would simplify/robustify alot of
> stuff?
>
> Jason
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ