linux-kernel - Re: [RFC v1 1/4] kho: Introduce KHO page table data structures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+CK2bBbSSyCDAAgThDSSwH0WdOeHz-eVgB-1bdiwsDtTSE5pg@mail.gmail.com>
Date: Wed, 17 Sep 2025 12:18:39 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Jason Miu <jasonmiu@...gle.com>, Alexander Graf <graf@...zon.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Baoquan He <bhe@...hat.com>, 
	Changyuan Lyu <changyuanl@...gle.com>, David Matlack <dmatlack@...gle.com>, 
	David Rientjes <rientjes@...gle.com>, Joel Granados <joel.granados@...nel.org>, 
	Marcos Paulo de Souza <mpdesouza@...e.com>, Mario Limonciello <mario.limonciello@....com>, 
	Mike Rapoport <rppt@...nel.org>, Petr Mladek <pmladek@...e.com>, 
	"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>, Steven Chen <chenste@...ux.microsoft.com>, 
	Yan Zhao <yan.y.zhao@...el.com>, kexec@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC v1 1/4] kho: Introduce KHO page table data structures

On Wed, Sep 17, 2025 at 8:22 AM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote:
> > + * kho_order_table
> > + * +-------------------------------+--------------------+
> > + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER |
> > + * ++------------------------------+--------------------+
> > + *  |
> > + *  |
> > + *  v
> > + * ++------+
> > + * |  Lv6  | kho_page_table
> > + * ++------+
>
> I seem to remember suggesting this could be simplified without the
> special case 7h level table table for order.
>
> Encode the phys address as:
>
> (order << 51) | (phys >> (PAGE_SHIFT + order))

Why 51 and not 52, this limits to 63bit address space, is it not?

>
> Then you don't need another table for order, the 64 bits encode
> everything consistently. Order can't be > 52 so it is
> only 6 bits, meaning the result fits into at most 57 bits.
>

Hi Jason,

Nice packing. That's a really clever bit-packing scheme to create a
unified address space.

I like the idea, but I'm trying to find the benefits compared to the
current per-order tree approach.

1. Packing adds a slight performance overhead for higher orders. With
the current approach, preserving higher order pages only requires a
3/4-level page table. With bit-packing proposal we will always have
extra loads during preserve/unpreserve operations.

2. It also adds insignificant memory overhead, as extra levels will
have a couple extra pages.

3. It slightly complicates the logic in the new kernel. Instead of
simply iterating a known tree for a specific order, the boot-time
walker would need to reconstruct the per-order subtrees, and walk
them.

Perhaps I'm missing a key benefit of the unified tree? The current
approach might not be as elegant as having everything packed into the
same page table but it seems to be OK to me, and easy to understand.

Pasha