[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f64372ec-c127-457f-b8e2-0f48223bd147@gmx.de>
Date: Mon, 15 Sep 2025 15:08:40 +0200
From: Helge Deller <deller@....de>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
Helge Deller <deller@...nel.org>, David Hildenbrand <david@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
"David S. Miller" <davem@...emloft.net>,
Linux Memory Management List <linux-mm@...ck.org>, netdev@...r.kernel.org,
Linux parisc List <linux-parisc@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate
page_pool_page_is_pp()
On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> Helge Deller <deller@...nel.org> writes:
>
>> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
>> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
>> 32-bit platforms.
>>
>> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
>> pages, but the remaining bits are not sufficient to unambiguously identify
>> such pages any longer.
>
> Why not? What values end up in pp_magic that are mistaken for the
> pp_signature?
As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
And we have PP_SIGNATURE == 0x40 (since POISON_POINTER_DELTA is zero on 32-bit platforms).
That means, that before page_pool_page_is_pp() could clearly identify such pages,
as the (value & 0xFFFFFFFC) == 0x40.
So, basically only the 0x40 value indicated a PP page.
Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
just showing a few examples:
0x01111040
0x082330C0
0x03264040
0x0ad686c0 ....
For me it crashes immediately at bootup when memblocked pages are handed
over to become normal pages.
> As indicated by the comment above the definition of the PP_DMA_INDEX_*
> definitions, I did put some care into ensuring that the values would not
> get mistaken, see specifically this:
>
> (...) On arches where POISON_POINTER_DELTA is
> * 0, we make sure that we leave the six topmost bits empty, as that guarantees
> * we won't mistake a valid kernel pointer for a value we set, regardless of the
> * VMSPLIT setting.
>
> So if that does not hold, I'd like to understand why not?
Because on 32-bit arches POISON_POINTER_DELTA is zero, and as such
you basically can't take away any of the remaining low 32 (30) bits.
>> So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
>> pages and as such triggers a kernel BUG as it believes it found a page pool
>> leak.
>>
>> There are patches upcoming where page_pool_page_is_pp() will not depend on
>> PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
>> those patches are merged, the easiest temporary fix is to disable the check
>> on 32-bit platforms.
>
> As Jesper pointed out, we also use this check internally in the network
> stack, and the patch as proposed will at least trigger the
> DEBUG_NET_WARN_ON_ONCE() in include/net/netmem.h.
Interestingly it did not triggered this warning for me.
Need to look into this.
> I think a better
> solution would be, as Jesper also alludes to, simply adding more bits to
> the mask. For instance, the patch below reserves (somewhat arbitrarily)
> six bits instead of two, changing the mask to 0xfc00007c; would that
> work?
That just shifting the main problem and you may be lucky for short time.
page_pool_page_is_pp() need to *reliably* detect pp pages, all other is guessing.
I don't believe there is any way to really fix it for 32-bit other than
reverting your change, or to disable the check (on 32-bit).
Helge
>
> -Toke
>
> diff --git i/include/linux/mm.h w/include/linux/mm.h
> index 1ae97a0b8ec7..17cb8157ba08 100644
> --- i/include/linux/mm.h
> +++ w/include/linux/mm.h
> @@ -4159,12 +4159,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
> + * 0, we make sure that we leave the six topmost bits empty, as that guarantees
> * we won't mistake a valid kernel pointer for a value we set, regardless of the
> * VMSPLIT setting.
> *
> * Altogether, this means that the number of bits available is constrained by
> - * the size of an unsigned long (at the upper end, subtracting two bits per the
> + * the size of an unsigned long (at the upper end, subtracting six bits per the
> * above), and the definition of PP_SIGNATURE (with or without
> * POISON_POINTER_DELTA).
> */
> @@ -4175,8 +4175,8 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> */
> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
> #else
> -/* Always leave out the topmost two; see above. */
> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
> +/* Always leave out the topmost six; see above. */
> +#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 6)
> #endif
>
> #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
>
Powered by blists - more mailing lists