[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mafs0tt2buy0m.fsf@kernel.org>
Date: Wed, 13 Aug 2025 15:55:21 +0200
From: Pratyush Yadav <pratyush@...nel.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Pratyush Yadav <pratyush@...nel.org>, Vipin Sharma
<vipinsh@...gle.com>, jasonmiu@...gle.com, graf@...zon.com,
changyuanl@...gle.com, rppt@...nel.org, dmatlack@...gle.com,
rientjes@...gle.com, corbet@....net, rdunlap@...radead.org,
ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com,
ojeda@...nel.org, aliceryhl@...gle.com, masahiroy@...nel.org,
akpm@...ux-foundation.org, tj@...nel.org, yoann.congal@...le.fr,
mmaurer@...gle.com, roman.gushchin@...ux.dev, chenridong@...wei.com,
axboe@...nel.dk, mark.rutland@....com, jannh@...gle.com,
vincent.guittot@...aro.org, hannes@...xchg.org,
dan.j.williams@...el.com, david@...hat.com, joel.granados@...nel.org,
rostedt@...dmis.org, anna.schumaker@...cle.com, song@...nel.org,
zhangguopeng@...inos.cn, linux@...ssschuh.net,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, gregkh@...uxfoundation.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, rafael@...nel.org, dakr@...nel.org,
bartosz.golaszewski@...aro.org, cw00.choi@...sung.com,
myungjoo.ham@...sung.com, yesanishhere@...il.com,
Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com,
aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, lennart@...ttering.net, brauner@...nel.org,
linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org,
saeedm@...dia.com, ajayachandra@...dia.com, jgg@...dia.com,
parav@...dia.com, leonro@...dia.com, witu@...dia.com
Subject: Re: [PATCH v3 29/30] luo: allow preserving memfd
On Wed, Aug 13 2025, Pasha Tatashin wrote:
> On Wed, Aug 13, 2025 at 12:29 PM Pratyush Yadav <pratyush@...nel.org> wrote:
>>
>> Hi Vipin,
>>
>> Thanks for the review.
>>
>> On Tue, Aug 12 2025, Vipin Sharma wrote:
>>
>> > On 2025-08-07 01:44:35, Pasha Tatashin wrote:
>> >> From: Pratyush Yadav <ptyadav@...zon.de>
>> >> +static void memfd_luo_unpreserve_folios(const struct memfd_luo_preserved_folio *pfolios,
>> >> + unsigned int nr_folios)
>> >> +{
>> >> + unsigned int i;
>> >> +
>> >> + for (i = 0; i < nr_folios; i++) {
>> >> + const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
>> >> + struct folio *folio;
>> >> +
>> >> + if (!pfolio->foliodesc)
>> >> + continue;
>> >> +
>> >> + folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
>> >> +
>> >> + kho_unpreserve_folio(folio);
>> >
>> > This one is missing WARN_ON_ONCE() similar to the one in
>> > memfd_luo_preserve_folios().
>>
>> Right, will add.
>>
>> >
>> >> + unpin_folio(folio);
>>
>> Looking at this code caught my eye. This can also be called from LUO's
>> finish callback if no one claimed the memfd after live update. In that
>> case, unpin_folio() is going to underflow the pincount or refcount on
>> the folio since after the kexec, the folio is no longer pinned. We
>> should only be doing folio_put().
>>
>> I think this function should take a argument to specify which of these
>> cases it is dealing with.
>>
>> >> + }
>> >> +}
>> >> +
>> >> +static void *memfd_luo_create_fdt(unsigned long size)
>> >> +{
>> >> + unsigned int order = get_order(size);
>> >> + struct folio *fdt_folio;
>> >> + int err = 0;
>> >> + void *fdt;
>> >> +
>> >> + if (order > MAX_PAGE_ORDER)
>> >> + return NULL;
>> >> +
>> >> + fdt_folio = folio_alloc(GFP_KERNEL, order);
>> >
>> > __GFP_ZERO should also be used here. Otherwise this can lead to
>> > unintentional passing of old kernel memory.
>>
>> fdt_create() zeroes out the buffer so this should not be a problem.
>
> You are right, fdt_create() zeroes the whole buffer, however, I wonder
> if it could be `optimized` to only clear only the header part of FDT,
> not the rest and this could potentially lead us to send an FDT buffer
> that contains both a valid FDT and the trailing bits contain data from
> old kernel.
Fair enough. At least the API documentation does not say anything about
the state of the buffer. My main concern was around performance since
the FDT can be multiple megabytes long for big memfds. Anyway, this
isn't in the blackout window so perhaps we can live with it. Will add
the GFP_ZERO.
--
Regards,
Pratyush Yadav
Powered by blists - more mailing lists