[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHN2nPK+Z5cvQ_waTWyPZiEoeSc9o7e3YnQLLjRzNzrb7VhAqQ@mail.gmail.com>
Date: Thu, 18 Sep 2025 23:49:06 -0700
From: Jason Miu <jasonmiu@...gle.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, Alexander Graf <graf@...zon.com>,
Andrew Morton <akpm@...ux-foundation.org>, Baoquan He <bhe@...hat.com>,
Changyuan Lyu <changyuanl@...gle.com>, David Matlack <dmatlack@...gle.com>,
David Rientjes <rientjes@...gle.com>, Joel Granados <joel.granados@...nel.org>,
Marcos Paulo de Souza <mpdesouza@...e.com>, Mario Limonciello <mario.limonciello@....com>,
Mike Rapoport <rppt@...nel.org>, Petr Mladek <pmladek@...e.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>, Steven Chen <chenste@...ux.microsoft.com>,
Yan Zhao <yan.y.zhao@...el.com>, kexec@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC v1 1/4] kho: Introduce KHO page table data structures
Hi Jason,
On Wed, Sep 17, 2025 at 9:32 AM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Wed, Sep 17, 2025 at 12:18:39PM -0400, Pasha Tatashin wrote:
> > On Wed, Sep 17, 2025 at 8:22 AM Jason Gunthorpe <jgg@...dia.com> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote:
> > > > + * kho_order_table
> > > > + * +-------------------------------+--------------------+
> > > > + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER |
> > > > + * ++------------------------------+--------------------+
> > > > + * |
> > > > + * |
> > > > + * v
> > > > + * ++------+
> > > > + * | Lv6 | kho_page_table
> > > > + * ++------+
> > >
> > > I seem to remember suggesting this could be simplified without the
> > > special case 7h level table table for order.
> > >
> > > Encode the phys address as:
> > >
> > > (order << 51) | (phys >> (PAGE_SHIFT + order))
> >
> > Why 51 and not 52, this limits to 63bit address space, is it not?
>
> Yeah, might have got the math off
>
> > I like the idea, but I'm trying to find the benefits compared to the
> > current per-order tree approach.
>
> It is probably about half the code compared to what I see here because
> everything is agressively simplified.
Thank you very much for providing feedback to me, and I think this is
a very smart idea.
> > 3. It slightly complicates the logic in the new kernel. Instead of
> > simply iterating a known tree for a specific order, the boot-time
> > walker would need to reconstruct the per-order subtrees, and walk
> > them.
>
> The core walker just runs over a range, it is easy to compute the
> range.
I believe the "range" here refers to the specific portion of the tree
relevant to the `target_order` being restored, while the
`target_order` is the variable from 0 to MAX_PAGE_ORDER to be used in
the tree walk in the new kernel.
My current understanding of the walker for a given `target_order`:
1. Find the `start_level` from the `target_order`. (for example,
target_order = 10, start_level = 4)
2. The path from the root down to the level above `start_level` is
fixed (index 0 at each of these levels).
3. At `start_level`, the index is also fixed, by (1 << (63 -
PAGE_SHIFT - order)) in a 9 bit slice.
4. Then, for all levels *below* `order_level`, the walker iterates
through all 512 table entries, until the bitmap level.
so the "range" is the subtrees under the start_level, is my
understanding correct?
--
Jason Miu
Powered by blists - more mailing lists