[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878qhvm62t.fsf@toke.dk>
Date: Wed, 01 Oct 2025 09:31:06 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, David Hildenbrand
<david@...hat.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R.
Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Mike
Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, Michal
Hocko <mhocko@...e.com>, Jesper Dangaard Brouer <hawk@...nel.org>, Ilias
Apalodimas <ilias.apalodimas@...aro.org>, Jakub Kicinski
<kuba@...nel.org>, stable@...r.kernel.org, Helge Deller <deller@....de>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
<horms@...nel.org>, linux-mm@...ck.org, netdev@...r.kernel.org
Subject: Re: [PATCH net v2] page_pool: Fix PP_MAGIC_MASK to avoid crashing
on some 32-bit arches
Mina Almasry <almasrymina@...gle.com> writes:
> On Tue, Sep 30, 2025 at 4:43 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>
>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>> too wide, so the page_pool_page_is_pp() incurs false positives which
>> crashes the machine.
>>
>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>> code itself malfunctioning; so instead of doing this, this patch changes
>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>> pointers for page_pool-tagged pages.
>>
>> The fix relies on the kernel pointers that alias with the pp_magic field
>> always being above PAGE_OFFSET. With this assumption, we can use the
>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>
>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>> constant, nor on it always being >0, we fall back to disabling the
>> dma_index storage when there are not enough bits available. This leaves
>> us in the situation we were in before the patch in the Fixes tag, but
>> only on a subset of architecture configurations. This seems to be the
>> best we can do until the transition to page types in complete for
>> page_pool pages.
>>
>> v2:
>> - Make sure there's at least 8 bits available and that the PAGE_OFFSET
>> bit calculation doesn't wrap
>>
>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>> Cc: stable@...r.kernel.org # 6.15+
>> Tested-by: Helge Deller <deller@....de>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
>> ---
>> include/linux/mm.h | 22 +++++++------
>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>> 2 files changed, 66 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1ae97a0b8ec7..0905eb6b55ec 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>> - * VMSPLIT setting.
>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>> + * known at compile-time.
>> *
>> - * Altogether, this means that the number of bits available is constrained by
>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>> - * above), and the definition of PP_SIGNATURE (with or without
>> - * POISON_POINTER_DELTA).
>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>> + * small to leave at least 8 bits available above PP_SIGNATURE, we define the
>> + * number of bits to be 0, which turns off the DMA index tracking altogether
>> + * (see page_pool_register_dma_index()).
>> */
>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>> #if POISON_POINTER_DELTA > 0
>> @@ -4175,8 +4174,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> */
>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>> #else
>> -/* Always leave out the topmost two; see above. */
>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>> +/* Use the lowest bit of PAGE_OFFSET if there's at least 8 bits available; see above */
>> +#define PP_DMA_INDEX_MIN_OFFSET (1 << (PP_DMA_INDEX_SHIFT + 8))
>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && \
>> + PAGE_OFFSET >= PP_DMA_INDEX_MIN_OFFSET && \
>> + !(PAGE_OFFSET & (PP_DMA_INDEX_MIN_OFFSET - 1))) ? \
>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>> +
>> #endif
>
> It took some staring at, but I think I understand this code and it is
> correct. This is the critical check, it's making sure that the bits
> used by PAGE_OFFSET are not shared with the bits used for the
> dma-index:
>
>> + !(PAGE_OFFSET & (PP_DMA_INDEX_MIN_OFFSET - 1))) ? \
>
> The following check confused me for a while, but I think I figured it
> out. It's checking that the bits used for PAGE_OFFSET are 'higher'
> than the bits used for PP_DMA_INDEX:
>
>> + PAGE_OFFSET >= PP_DMA_INDEX_MIN_OFFSET && \
>
> And finally this calculation should indeed be the bits we can use (the
> empty space between the lsb set by PAGE_OFFSET and the msb set by the
> pp magic:
>
>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
Yup, exactly! Thanks for walking through it and confirming that the
logic is sound :)
> AFAIU we should not need the MIN anymore, since that subtraction is
> guaranteed to be positive, but that's a nit.
The MIN was originally there to limit how many bits we use for 64-bit
systems that don't set POISON_POINTER_DELTA, since xarray uses a u32 for
the size of the limits. Not sure if such a combination exists in the
real world, but I figure that having it there doesn't hurt in any case.
> Reviewed-by: Mina Almasry <almasrymina@...gle.com>
Thanks!
-Toke
Powered by blists - more mailing lists