[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+CK2bCm=AGDSLTbs6etbqveFUHU80okE-bCT2zg20nrHXgHRQ@mail.gmail.com>
Date: Tue, 30 Dec 2025 13:21:31 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Pratyush Yadav <pratyush@...nel.org>, Evangelos Petrongonas <epetron@...zon.de>,
Alexander Graf <graf@...zon.com>, Andrew Morton <akpm@...ux-foundation.org>,
Jason Miu <jasonmiu@...gle.com>, linux-kernel@...r.kernel.org,
kexec@...ts.infradead.org, linux-mm@...ck.org, nh-open-source@...zon.com
Subject: Re: [PATCH] kho: add support for deferred struct page init
On Tue, Dec 30, 2025 at 12:18 PM Mike Rapoport <rppt@...nel.org> wrote:
>
> On Tue, Dec 30, 2025 at 11:18:12AM -0500, Pasha Tatashin wrote:
> > On Tue, Dec 30, 2025 at 11:16 AM Mike Rapoport <rppt@...nel.org> wrote:
> > >
> > > On Tue, Dec 30, 2025 at 11:05:05AM -0500, Pasha Tatashin wrote:
> > > > On Mon, Dec 29, 2025 at 4:03 PM Pratyush Yadav <pratyush@...nel.org> wrote:
> > > > >
> > > > > The magic is purely sanity checking. It is not used to decide anything
> > > > > other than to make sure this is actually a KHO page. I don't intend to
> > > > > change that. My point is, if we make sure the KHO pages are properly
> > > > > initialized during MM init, then restoring can actually be a very cheap
> > > > > operation, where you only do the sanity checking. You can even put the
> > > > > magic check behind CONFIG_KEXEC_HANDOVER_DEBUG if you want, but I think
> > > > > it is useful enough to keep in production systems too.
> > > >
> > > > It is part of a critical hotpath during blackout, should really be
> > > > behind CONFIG_KEXEC_HANDOVER_DEBUG
> > >
> > > Do you have the numbers? ;-)
> >
> > The fastest reboot we can achieve is ~0.4s on ARM
>
> I meant the difference between assigning info.magic and skipping it.
It is proportional to the amount of preserved memory. Extra assignment
for each page. In our fleet we have observed IOMMU page tables to be
20G in size. So, let's just assume it is 20G. That is: 20 * 1024^3 /
4096 = 5.24 million pages. If we access "struct page" only for the
magic purpose, we fetch full 64-byte cacheline, which is 5.24 million
* 64 bytes = 335 M, that is ~13ms with ~25G/s DRAM; and also each TLB
miss will add some latency, 5.2M * 10ns = ~50ms. In total we can get
15ms ~ 50ms regression compared to 400ms, that is 4-12%. It will be
less if we also access "struct page" for another reason at the same
time, but still it adds up.
>
> > (shutdown+purgatory+boot), let's not add anything to regress, as every
> > microsecond counts during blackout.
>
> Any added functionality adds cycles, this is inevitable. And neither KHO
> nor LUO are near the completion, so we'll have to add functionality to both
> of them. And the added functionality should be correct first and foremost.
> And magic sanity check seems pretty useful and presumably cheap enough to
> always keep it unless you see a real slowdown because of it.
Magic check is proportional to the amount of preserved memory. It is
not a required functionality, only a sanity checking. I really do not
see a reason to enable it in production. All other sanity struct page,
and pg_flags related sanity checking are usually enabled with
CONFIG_DEBUG_VM, so enabling it only with CONFIG_KEXEC_HANDOVER_DEBUG
is better.
Pasha
>
> > Pasha
>
> --
> Sincerely yours,
> Mike.
Powered by blists - more mailing lists