lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86wm206qjx.fsf@kernel.org>
Date: Fri, 02 Jan 2026 15:24:18 +0100
From: Pratyush Yadav <pratyush@...nel.org>
To: Mike Rapoport <rppt@...nel.org>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>,  Pratyush Yadav
 <pratyush@...nel.org>,  Evangelos Petrongonas <epetron@...zon.de>,
  Alexander Graf <graf@...zon.com>,  Andrew Morton
 <akpm@...ux-foundation.org>,  Jason Miu <jasonmiu@...gle.com>,
  linux-kernel@...r.kernel.org,  kexec@...ts.infradead.org,
  linux-mm@...ck.org,  nh-open-source@...zon.com
Subject: Re: [PATCH] kho: add support for deferred struct page init

On Wed, Dec 31 2025, Mike Rapoport wrote:

> On Tue, Dec 30, 2025 at 01:21:31PM -0500, Pasha Tatashin wrote:
>> On Tue, Dec 30, 2025 at 12:18 PM Mike Rapoport <rppt@...nel.org> wrote:
>> >
>> > On Tue, Dec 30, 2025 at 11:18:12AM -0500, Pasha Tatashin wrote:
>> > > On Tue, Dec 30, 2025 at 11:16 AM Mike Rapoport <rppt@...nel.org> wrote:
>> > > >
>> > > > On Tue, Dec 30, 2025 at 11:05:05AM -0500, Pasha Tatashin wrote:
>> > > > > On Mon, Dec 29, 2025 at 4:03 PM Pratyush Yadav <pratyush@...nel.org> wrote:
>> > > > > >
>> > > > > > The magic is purely sanity checking. It is not used to decide anything
>> > > > > > other than to make sure this is actually a KHO page. I don't intend to
>> > > > > > change that. My point is, if we make sure the KHO pages are properly
>> > > > > > initialized during MM init, then restoring can actually be a very cheap
>> > > > > > operation, where you only do the sanity checking. You can even put the
>> > > > > > magic check behind CONFIG_KEXEC_HANDOVER_DEBUG if you want, but I think
>> > > > > > it is useful enough to keep in production systems too.
>> > > > >
>> > > > > It is part of a critical hotpath during blackout, should really be
>> > > > > behind CONFIG_KEXEC_HANDOVER_DEBUG
>> > > >
>> > > > Do you have the numbers? ;-)
>> > >
>> > > The fastest reboot we can achieve is ~0.4s on ARM
>> >
>> > I meant the difference between assigning info.magic and skipping it.
>> 
>> It is proportional to the amount of preserved memory. Extra assignment
>> for each page. In our fleet we have observed IOMMU page tables to be
>> 20G in size. So, let's just assume it is 20G. That is: 20 * 1024^3 /

The magic check is done for each preservation, not for each page. So if
the 20G of preserved memory is 1G huge pages, then you only need 20 to
check the magic 20 times.

>
> Do you see 400ms reboot times on machines that have 20G of IOMMU page
> tables? That's impressive presuming the overall size of those machines. 
>
>> 4096 = 5.24 million pages. If we access "struct page" only for the
>> magic purpose, we fetch full 64-byte cacheline, which is 5.24 million
>> * 64 bytes = 335 M, that is ~13ms with ~25G/s DRAM; and also each TLB
>> miss will add some latency, 5.2M * 10ns = ~50ms. In total we can get
>> 15ms ~ 50ms regression compared to 400ms, that is 4-12%. It will be
>> less if we also access "struct page" for another reason at the same
>> time, but still it adds up.
>
> Your overhead calculations are based on the assumption that we don't
> access struct page, but we do. We assign page->private during
> deserialization and then initialize struct page during restore.
> We get the hit of cache fetches and TLB misses anyway.

Exactly. The cache line will be fetched anyway. So I think the real
overhead is a fetch and compare.

>
> It would be interesting to see the difference *measured* on those large
> systems.
>
[...]

-- 
Regards,
Pratyush Yadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ