[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAmPN+v7SYsdHA+RL4kFfnoQvKqTqZ2YQ4wdq6dnFkotg@mail.gmail.com>
Date: Fri, 24 Oct 2025 10:36:45 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Pratyush Yadav <pratyush@...nel.org>, akpm@...ux-foundation.org, brauner@...nel.org,
corbet@....net, graf@...zon.com, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-mm@...ck.org, masahiroy@...nel.org,
ojeda@...nel.org, rdunlap@...radead.org, rppt@...nel.org, tj@...nel.org,
jasonmiu@...gle.com, dmatlack@...gle.com, skhawaja@...gle.com,
glider@...gle.com, elver@...gle.com
Subject: Re: [PATCH 2/2] liveupdate: kho: allocate metadata directly from the
buddy allocator
On Fri, Oct 24, 2025 at 10:20 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Fri, Oct 24, 2025 at 09:57:24AM -0400, Pasha Tatashin wrote:
>
> > You're right the new kernel will eventually zero memory, but KHO
> > preserves at page granularity. If we preserve a single slab object,
> > the entire page is handed off. When the new kernel maps that page
> > (e.g., to userspace) to access the preserved object, it also exposes
> > the unpreserved portions of that same page. Those portions contain
> > stale data from the old kernel and won't have been zeroed yet,
> > creating an easy-to-miss data leak vector.
>
> Do we zero any of the memory on KHO? Honestly, I wouldn't worry about
> the point it zeros, slab guarentees it will be zero when it should be
> zero.
We do not zero memory on kexec/KHO/LU; instead, the next kernel zeroes
memory on demand during allocation. My point is that the KHO interface
retrieves a full page in the next kernel, not an individual slab.
Consequently, a caller might retrieve data that was preserved as a
slab in the previous kernel, expose that data to the user, and
unintentionally leak the remaining part of the page as well.
> > There's also the inefficiency. The unpreserved parts of that page are
> > unusable by the new kernel until the preserved object is freed.
>
> Thats not how I see slab preservation working. When the slab page
> is unpreserved all the free space in that page should be immediately
> available to the sucessor kernel.
This ties into the same problem. The scenario I'm worried about is:
1. A caller preserves one small slab object.
2. In the new kernel, the caller retrieves the entire page that
contains this object.
3. The caller uses the data from that slab object without freeing it.
In this case, the rest of the page, all the other slab slots, is
effectively wasted. The page can't be fully returned to the system or
used by the slab allocator until that one preserved object is freed,
which might be never. The free space isn't "immediately available"
because the page is being held by the caller, even though the caller
is using only a slab in that page.
> > As I see it, the only robust solution is to use a special GFP flag.
> > This would force these allocations to come from a dedicated pool of
> > pages that are fully preserved, with no partial/mixed-use pages and
> > also retrieved as slabs.
>
> It is certainly more efficient to preserve fewer slab pages in total
> and pooling would help get there.
>
> > That said, I'm not sure preserving individual slab objects is a high
> > priority right now. It might be simpler to avoid it altogether.
>
> I think we will need something, a lot of the structs I'm seeing in
> other patches are small and allocating a whole page is pretty wasteful
> too.
If we're going to support this, it would have to be specifically
engineered as full slab support for KHO preservation, where the
interface retrieves slab objects directly, not the pages they're on,
and I think would require using a special GFP_PRESERVED flag.
> Jason
Powered by blists - more mailing lists