lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251009175205.GB3899236@nvidia.com>
Date: Thu, 9 Oct 2025 14:52:05 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Jason Miu <jasonmiu@...gle.com>
Cc: Alexander Graf <graf@...zon.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Baoquan He <bhe@...hat.com>, Changyuan Lyu <changyuanl@...gle.com>,
	David Matlack <dmatlack@...gle.com>,
	David Rientjes <rientjes@...gle.com>,
	Mike Rapoport <rppt@...nel.org>,
	Pasha Tatashin <pasha.tatashin@...een.com>,
	Pratyush Yadav <pratyush@...nel.org>, kexec@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures

> > > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map"
> > > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree"
> > >  #define PROP_SUB_FDT "fdt"
> >
> > I'de really like to see all of these sorts of definitions in some
> > structured ABI header not open coded all over the place..
> 
> Do you think `include/linux/kexec_handover.h` is the appropriate
> place, or would you prefer a new, dedicated ABI header (e.g., in
> `include/uapi/linux/`) for all KHO-related FDT constants?

I would avoid uapi, but maybe Pasha has some
idea.

 include/linux/live_update/abi/ ?

> Agreed. Will change `u64` according to Pasha's comment. And we use
> explicit casts like `(u64)virt_to_phys(new_tree)` and `(struct
> kho_radix_tree *)phys_to_virt(table_entry)` in the current series. I
> believe this, along with the `u64` type, makes it clear that the table
> stores physical addresses.

Well, the macros were intended to automate this and avoid mistakes
from opencoding.. Just keep using them?

> > > + */
> > > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order)
> >
> > pa is phys_addr_t in the kernel, never unsigned long.
> >
> > If you want to make it all dynamic then this should be phys_addr_t
> 
> Should this also be `u64`, or we stay with `phys_addr_t` for all page
> addresses?

you should use phys_addr_t for everything that originates from a
phys_addr_t, and u64 for all the ABI

> > > +{
> > > +     unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order);
> >
> > And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which
> > may not exist.
> >
> > Use an enum ORDER_0_LG2 maybe
> 
> I prefer `KHO_RADIX_ORDER_0_BIT_POS` (defined as `BITS_PER_LONG -
> PAGE_SHIFT`) over `ORDER_0_LG2`, as I think the latter is a bit hard
> to understand, what do you think? This constant, along with others,
> will be placed in the enum.

Sure, though I prefer LG2 to BIT_POS

BIT_POS to me implies it is being used as  bit wise operation, while
log2 is a mathematical concept

  X_lg2 = ilog2(X)  &&  X == 1 << X_lg2

> > > +                             kho_radix_tree_walk_callback_t cb)
> > > +{
> > > +     int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long));
> > > +     struct kho_radix_tree *next_tree;
> > > +     unsigned long encoded, i;
> > > +     int err = 0;
> > >
> > > +     if (level == 1) {
> > > +             encoded = offset;
> > > +             return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root,
> > > +                                           encoded, cb);
> >
> > Better to do this in the caller  a few lines below
> 
> But the caller is in a different tree level? Should we only walk the
> bitmaps at the lowest level?

I mean just have the caller do

if (level-1 ==0)
   kho_radix_walk_bitmaps()
else
   ..

Avoids a function call

> > > +     for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) {
> > > +             if (root->table[i]) {
> > > +                     encoded = offset << level_shift | i;
> >
> > This doesn't seem right..
> >
> > The argument to the walker should be the starting encoded of the table
> > it is about to walk.
> >
> > Since everything always starts at 0 it should always be
> >   start | (i << level_shift)
> >
> > ?
> 
> You're right that this line might not be immediately intuitive. The
> var `level_shift` (which is constant value 9 here) is applied to the
> *accumulated* `offset` from the parent level. Let's consider an
> example of a preserved page at physical address `0x1000`, which
> encodes to `0x10000000000001` (bit 52 is set for order 0, bit 0 is set
> for page 1).

Oh, weird, too weird maybe. I'd just keep all the values as fully
shifted, level_shift should be adjusted to have the full shift for
this level. Easier to understand.

Also, I think the order bits might have become a bit confused, I think
I explained it wrong.

My idea was to try to share the radix levels to save space eg if we
have like this patch does:

  Order phys
  00001 abcd
  00010 0bcd
  00100 00cd
  01000 000d

Then we don't get too much page level sharing, the middle ends up with
0 indexes in tables that cannot be shared.

What I was going for was to push all the shared pages to the left

  00001 abcd
  00000 1bcd
  00000 01cd
  00000 001d

Here the first radix level has index 0 or 1 and is fully shared. So eg
Order 4 and 5 will have all the same 0 index table levels. This also
reduces the max height of the tree because only +1 bit is needed to
store order.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ