[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+CK2bBBX+HgD0HLj-AyTScM59F2wXq11BEPgejPMHoEwqj+_Q@mail.gmail.com>
Date: Mon, 10 Feb 2025 15:58:00 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Mike Rapoport <rppt@...nel.org>, linux-kernel@...r.kernel.org,
Alexander Graf <graf@...zon.com>, Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>, Anthony Yznaga <anthony.yznaga@...cle.com>,
Arnd Bergmann <arnd@...db.de>, Ashish Kalra <ashish.kalra@....com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>, Borislav Petkov <bp@...en8.de>,
Catalin Marinas <catalin.marinas@....com>, Dave Hansen <dave.hansen@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>, Eric Biederman <ebiederm@...ssion.com>,
Ingo Molnar <mingo@...hat.com>, James Gowans <jgowans@...zon.com>, Jonathan Corbet <corbet@....net>,
Krzysztof Kozlowski <krzk@...nel.org>, Mark Rutland <mark.rutland@....com>,
Paolo Bonzini <pbonzini@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>, Pratyush Yadav <ptyadav@...zon.de>,
Rob Herring <robh+dt@...nel.org>, Rob Herring <robh@...nel.org>,
Saravana Kannan <saravanak@...gle.com>,
Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>, Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>, Tom Lendacky <thomas.lendacky@....com>,
Usama Arif <usama.arif@...edance.com>, Will Deacon <will@...nel.org>, devicetree@...r.kernel.org,
kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers
On Mon, Feb 10, 2025 at 3:22 PM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote:
> > diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho
> > new file mode 100644
> > index 000000000000..f13b252bc303
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-kernel-kho
> > @@ -0,0 +1,53 @@
> > +What: /sys/kernel/kho/active
> > +Date: December 2023
> > +Contact: Alexander Graf <graf@...zon.com>
> > +Description:
> > + Kexec HandOver (KHO) allows Linux to transition the state of
> > + compatible drivers into the next kexec'ed kernel. To do so,
> > + device drivers will serialize their current state into a DT.
> > + While the state is serialized, they are unable to perform
> > + any modifications to state that was serialized, such as
> > + handed over memory allocations.
> > +
> > + When this file contains "1", the system is in the transition
> > + state. When contains "0", it is not. To switch between the
> > + two states, echo the respective number into this file.
>
> I don't think this is a great interface for the actual state machine..
In our next proposal we are going to remove this "activate" phase.
>
> > +What: /sys/kernel/kho/dt_max
> > +Date: December 2023
> > +Contact: Alexander Graf <graf@...zon.com>
> > +Description:
> > + KHO needs to allocate a buffer for the DT that gets
> > + generated before it knows the final size. By default, it
> > + will allocate 10 MiB for it. You can write to this file
> > + to modify the size of that allocation.
>
> Seems gross, why can't it use a non-contiguous page list to generate
> the FDT? :\
We will consider some of these ideas in the future version. I like the
idea of using preserved memory to carry sparse KHO tree: i.e FDT over
sparse memory, maybe use the anchor page to describe how it should be
vmapped into a virtually contiguous tree in the next kernel?
>
> See below for a suggestion..
>
> > +static int kho_serialize(void)
> > +{
> > + void *fdt = NULL;
> > + int err = -ENOMEM;
> > +
> > + fdt = kvmalloc(kho_out.dt_max, GFP_KERNEL);
> > + if (!fdt)
> > + goto out;
> > +
> > + if (fdt_create(fdt, kho_out.dt_max)) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + err = fdt_finish_reservemap(fdt);
> > + if (err)
> > + goto out;
> > +
> > + err = fdt_begin_node(fdt, "");
> > + if (err)
> > + goto out;
> > +
> > + err = fdt_property_string(fdt, "compatible", "kho-v1");
> > + if (err)
> > + goto out;
> > +
> > + /* Loop through all kho dump functions */
> > + err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_DUMP, fdt);
> > + err = notifier_to_errno(err);
>
> I don't see this really working long term. I think we'd like each
> component to be able to serialize at its own pace under userspace
> control.
>
> This design requires that the whole thing be wrapped in a notifier
> callback just so we can make use of the fdt APIs.
>
> It seems like a poor fit me.
>
> IMHO if you want to keep using FDT I suggest that each serializing
> component (ie driver, ftrace whatever) allocate its own FDT fragment
> from scratch and the main KHO one just link to the memories that holds
> those fragements.
>
> Ie the driver experience would be more like
>
> kho = kho_start_storage("my_compatible_string,v1", some_kind_of_instance_key);
>
> fdt...(kho->fdt..)
>
> kho_finish_storage(kho);
>
> Where this ends up creating a stand alone FDT fragment:
>
> /dts-v1/;
> / {
> compatible = "linux-kho,my_compatible_string,v1";
> instance = some_kind_of_instance_key;
> key-value-1 = <..>;
> key-value-1 = <..>;
> };
>
> And then kho_finish_storage() would remember the phys/length until the
> kexec fdt is produced as the very last step.
>
> This way we could do things like fdbox an iommufd and create the above
> FDT fragment completely seperately from any notifier chain and,
> crucially, disconnected from the fdt_create() for the kexec payload.
>
> Further, if you split things like this (it will waste some small
> amount of memory) you can probably get to a point where no single FDT
> is more than 4k. That looks like it would simplify/robustify alot of
> stuff?
>
> Jason
>
Powered by blists - more mailing lists