[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250917122158.GC1086830@nvidia.com>
Date: Wed, 17 Sep 2025 09:21:58 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Jason Miu <jasonmiu@...gle.com>
Cc: Alexander Graf <graf@...zon.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Baoquan He <bhe@...hat.com>, Changyuan Lyu <changyuanl@...gle.com>,
David Matlack <dmatlack@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Joel Granados <joel.granados@...nel.org>,
Marcos Paulo de Souza <mpdesouza@...e.com>,
Mario Limonciello <mario.limonciello@....com>,
Mike Rapoport <rppt@...nel.org>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Petr Mladek <pmladek@...e.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
Steven Chen <chenste@...ux.microsoft.com>,
Yan Zhao <yan.y.zhao@...el.com>, kexec@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC v1 1/4] kho: Introduce KHO page table data structures
On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote:
> + * kho_order_table
> + * +-------------------------------+--------------------+
> + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER |
> + * ++------------------------------+--------------------+
> + * |
> + * |
> + * v
> + * ++------+
> + * | Lv6 | kho_page_table
> + * ++------+
I seem to remember suggesting this could be simplified without the
special case 7h level table table for order.
Encode the phys address as:
(order << 51) | (phys >> (PAGE_SHIFT + order))
Then you don't need another table for order, the 64 bits encode
everything consistently. Order can't be > 52 so it is
only 6 bits, meaning the result fits into at most 57 bits.
> + * 63 62:54 53:45 44:36 35:27 26:0
> + * +--------+--------+--------+--------+--------+-----------------+
> + * | Lv 6 | Lv 5 | Lv 4 | Lv 3 | Lv 2 | Lv 1 (bitmap) |
> + * +--------+--------+--------+--------+--------+-----------------+
This isn't quite right, the 11:0 bits are must be zero and not used to
index anything.
Adjusting to reflect the above math, it would be like this:
63:60 59:51 50:42 41:33 32:34 23:15 14:0
+-----+--------+--------+--------+--------+--------+-----------------+
| 0 | Lv 6 | Lv 5 | Lv 4 | Lv 3 | Lv 2 | Lv 1 (bitmap) |
+-----+--------+--------+--------+--------+--------+-----------------+
The order level is just folded into lv 6
> + * For higher order pages, the bit fields for each level shift to the left by
> + * the page order.
This is probably an unncessary complexity. The table levels cost only
64 bytes, it isn't so valuable to eliminate them. So with the above
math it shifts right not left. Level 1 is always the bitmap and it
doesn't move around. I'd label this 0 in the code.
If you also fix the sizes to be 64 bytes and 4096 bytes regardless of
PAGE_SIZE then everything is easy and fixed, while still efficient on
higher PAGE_SIZE architectures.
Fruther, changing the formula to this:
(1 << (63 - PAGE_SHIFT - order)) | (phys >> (PAGE_SHIFT + order))
Will shift the overhead levels to the top of the radix tree and share
them across all orders, higher PAGE_SIZE arches will just get a single
lvl 5 and an unecessary lvl 6 - cost 64 extra bytes who cares.
> +#define BITMAP_TABLE_SHIFT(_order) (PAGE_SHIFT + PAGE_SHIFT + 3 + (_order))
> +#define BITMAP_TABLE_MASK(_order) ((1ULL << BITMAP_TABLE_SHIFT(_order)) - 1)
> +#define PRESERVED_PAGE_OFFSET_SHIFT(_order) (PAGE_SHIFT + (_order))
> +#define PAGE_TABLE_SHIFT_PER_LEVEL (ilog2(PAGE_SIZE / sizeof(unsigned long)))
> +#define PAGE_TABLE_LEVEL_MASK ((1ULL << PAGE_TABLE_SHIFT_PER_LEVEL) - 1)
> +#define PTR_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long))
please use inlines and enums :(
It looks like if you make the above algorithm changes most of the this
code is deleted.
Jason
Powered by blists - more mailing lists