[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87ms6cmb31.fsf@toke.dk>
Date: Tue, 30 Sep 2025 13:30:42 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Paolo Abeni <pabeni@...hat.com>, Mina Almasry <almasrymina@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, David Hildenbrand
<david@...hat.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam R.
Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Mike
Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, Michal
Hocko <mhocko@...e.com>, Jesper Dangaard Brouer <hawk@...nel.org>, Ilias
Apalodimas <ilias.apalodimas@...aro.org>, Jakub Kicinski
<kuba@...nel.org>, Helge Deller <deller@...nel.org>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Simon Horman
<horms@...nel.org>, linux-mm@...ck.org, netdev@...r.kernel.org
Subject: Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on
some 32-bit arches
Paolo Abeni <pabeni@...hat.com> writes:
> On 9/30/25 9:45 AM, Toke Høiland-Jørgensen wrote:
>> Mina Almasry <almasrymina@...gle.com> writes:
>>> On Fri, Sep 26, 2025 at 4:40 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>>>
>>>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>>>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>>>> too wide, so the page_pool_page_is_pp() incurs false positives which
>>>> crashes the machine.
>>>>
>>>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>>>> code itself malfunctioning; so instead of doing this, this patch changes
>>>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>>>> pointers for page_pool-tagged pages.
>>>>
>>>> The fix relies on the kernel pointers that alias with the pp_magic field
>>>> always being above PAGE_OFFSET. With this assumption, we can use the
>>>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>>>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>>>
>>>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>>>> constant, nor on it always being >0, we fall back to disabling the
>>>> dma_index storage when there are no bits available. This leaves us in
>>>> the situation we were in before the patch in the Fixes tag, but only on
>>>> a subset of architecture configurations. This seems to be the best we
>>>> can do until the transition to page types in complete for page_pool
>>>> pages.
>>>>
>>>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>>>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
>>>> ---
>>>> Sorry for the delay on getting this out. I have only compile-tested it,
>>>> since I don't have any hardware that triggers the original bug. Helge, I'm
>>>> hoping you can take it for a spin?
>>>>
>>>> include/linux/mm.h | 18 +++++------
>>>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>>>> 2 files changed, 62 insertions(+), 32 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 1ae97a0b8ec7..28541cb40f69 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>>>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>>>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>>>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>>>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>>>> - * VMSPLIT setting.
>>>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>>>> + * known at compile-time.
>>>> *
>>>> - * Altogether, this means that the number of bits available is constrained by
>>>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>>>> - * above), and the definition of PP_SIGNATURE (with or without
>>>> - * POISON_POINTER_DELTA).
>>>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>>>> + * small to leave some bits available above PP_SIGNATURE, we define the number
>>>> + * of bits to be 0, which turns off the DMA index tracking altogether (see
>>>> + * page_pool_register_dma_index()).
>>>> */
>>>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>>>> #if POISON_POINTER_DELTA > 0
>>>> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>> */
>>>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>>>> #else
>>>> -/* Always leave out the topmost two; see above. */
>>>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>>>> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
>>>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
>>>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>>>
>>> Do you have to watch out for an underflow of __ffs(PAGE_OFFSET) -
>>> PP_DMA_INDEX_SHIFT (at which point we'll presumably use 32 here
>>> instead of the expected 0)? Or is that guaranteed to be positive for
>>> some reason I'm not immediately grasping.
>>
>> That's what the 'PAGE_OFFSET > PP_SIGNATURE' in the ternary operator is
>> for. I'm assuming that PAGE_OFFSET is always a "round" number (e.g.,
>> 0xc0000000), in which case that condition should be sufficient, no?
>
> IDK if such assumption is obviously true. I think it would be safer to
> somehow express it with a build time constraint.
Alright, I'll respin and constrain it further.
-Toke
Powered by blists - more mailing lists