lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHS8izNMHYuRk9w0BUEbXBob38NVkMOVMmvvcq30TstGFpob6A@mail.gmail.com>
Date: Wed, 17 Sep 2025 17:28:49 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Helge Deller <deller@....de>, Helge Deller <deller@...nel.org>, 
	David Hildenbrand <david@...hat.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	Ilias Apalodimas <ilias.apalodimas@...aro.org>, "David S. Miller" <davem@...emloft.net>, 
	Linux Memory Management List <linux-mm@...ck.org>, netdev@...r.kernel.org, 
	Linux parisc List <linux-parisc@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()

On Wed, Sep 17, 2025 at 3:09 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Mina Almasry <almasrymina@...gle.com> writes:
>
> > On Tue, Sep 16, 2025 at 2:27 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
> >>
> >> Mina Almasry <almasrymina@...gle.com> writes:
> >>
> >> > On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@....de> wrote:
> >> >>
> >> >> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> >> >> > Helge Deller <deller@...nel.org> writes:
> >> >> >
> >> >> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> >> >> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> >> >> >> 32-bit platforms.
> >> >> >>
> >> >> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> >> >> >> pages, but the remaining bits are not sufficient to unambiguously identify
> >> >> >> such pages any longer.
> >> >> >
> >> >> > Why not? What values end up in pp_magic that are mistaken for the
> >> >> > pp_signature?
> >> >>
> >> >> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
> >> >> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
> >> >> That means, that before page_pool_page_is_pp() could clearly identify such pages,
> >> >> as the (value & 0xFFFFFFFC) == 0x40.
> >> >> So, basically only the 0x40 value indicated a PP page.
> >> >>
> >> >> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
> >> >> just showing a few examples:
> >> >> 0x01111040
> >> >> 0x082330C0
> >> >> 0x03264040
> >> >> 0x0ad686c0 ....
> >> >>
> >> >> For me it crashes immediately at bootup when memblocked pages are handed
> >> >> over to become normal pages.
> >> >>
> >> >
> >> > I tried to take a look to double check here and AFAICT Helge is correct.
> >> >
> >> > Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
> >> > 0x40 is the only pointer that may be mistaken as a valid pp_magic.
> >> > AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
> >> > significant bits), doubles the number of pointers that can be mistaken
> >> > for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
> >> > mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
> >> > be mistaken as pp_magic?
> >> >
> >> > I don't know that there is any bits we can take away from
> >> > PP_MAGIC_MASK I think? As each bit doubles the probablity :(
> >> >
> >> > I would usually say we can check the 3 least significant bits to tell
> >> > if pp_magic is a pointer or not, but pp_magic is unioned with
> >> > page->lru I believe which will use those bits.
> >>
> >> So if the pointers stored in the same field can be any arbitrary value,
> >> you are quite right, there is no safe value. The critical assumption in
> >> the bit stuffing scheme is that the pointers stored in the field will
> >> always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
> >> the two top-most bits set (that is what the VMSPLIT reference in the
> >> comment above the PP_DMA_INDEX_SHIFT definition is alluding to).
> >>
> >
> > I see... but where does the 'PAGE_OFFSET has one (or both) of the two
> > top-most bits set)' assumption come from? Is it from this code?
>
> Well, from me grepping through the code and trying to make sense of all
> the different cases of the preprocessor and config directives across
> architectures. Seems I did not quite succeed :/
>
> > /*
> >  * PAGE_OFFSET -- the first address of the first page of memory.
> >  * When not using MMU this corresponds to the first free page in
> >  * physical memory (aligned on a page boundary).
> >  */
> > #ifdef CONFIG_MMU
> > #ifdef CONFIG_64BIT
> > ....
> > #else
> > #define PAGE_OFFSET _AC(0xc0000000, UL)
> > #endif /* CONFIG_64BIT */
> > #else
> > #define PAGE_OFFSET ((unsigned long)phys_ram_base)
> > #endif /* CONFIG_MMU */
> >
> > It looks like with !CONFIG_MMU we use phys_ram_base and I'm unable to
> > confirm that all the values of this have the first 2 bits set. I
> > wonder if his setup is !CONFIG_MMU indeed.
>
> Right, that's certainly one thing I missed. As was the parisc arch
> thing, as Helge followed up with. Ugh :/
>
> > It also looks like pp_magic is also union'd with __folio_index in
> > struct page, and it looks like the data there is sometimes used as a
> > pointer and sometimes not.
>
> Not according to my pahole:
>
> [...]
>                         union {
>                                 long unsigned int __folio_index; /*    32     8 */
> [...]
>         struct {
>                         long unsigned int pp_magic;      /*     8     8 */
>
> So I think we're good with this, no?
>
> So given the above, we could do something equivalent to this, I think?
>
> diff --git i/include/linux/mm.h w/include/linux/mm.h
> index 1ae97a0b8ec7..615aaa19c60c 100644
> --- i/include/linux/mm.h
> +++ w/include/linux/mm.h
> @@ -4175,8 +4175,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>   */
>  #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>  #else
> +#if PAGE_OFFSET > PP_SIGNATURE
>  /* Always leave out the topmost two; see above. */
> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
> +#define PP_DMA_INDEX_BITS MIN(32, __fls(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT - 1)

Shouldn't have this been:

#define PP_DMA_INDEX_BITS MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT)

I.e. you're trying to use the space between the least significant bit
set in PAGE_OFFSET and the most significant bit set in PP_SIGNATURE.
Hmm. I'm not sure I understand this, I may be reading wrong.

> +#else
> +#define PP_DMA_INDEX_BITS 0
> +#endif /* PAGE_OFFSET > PP_SIGNATURE */
>  #endif
>
>  #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS +  PP_DMA_INDEX_SHIFT - 1, \
>
>
> Except that it won't work in this form as-is because PAGE_OFFSET is not
> always a constant (see the #define PAGE_OFFSET ((unsigned
> long)phys_ram_base) that your quoted above), so we'll have to turn it
> into an inline function or something.
>
> I'm not sure adding this extra complexity is really worth it, or if we
> should just go with the '#define PP_DMA_INDEX_BITS 0' when
> POISON_POINTER_DELTA is unset and leave it at that for the temporary
> workaround. WDYT?
>

I think this would work. It still wouldn't handle cases where the data
in pp_magic ends up used as a non-pointer at all or a pointer to some
static variable in the code like `.mp_ops = &dmabuf_devmem_ops,`
right? Because these were never allocated from memory so are unrelated
to PAGE_OFFSET.

But I guess things like that would have been a problem with the old
code anwyway, so should be of no concern?

-- 
Thanks,
Mina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ