[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4185FF99-160F-46A9-A5A4-4CA48CC086D1@nvidia.com>
Date: Mon, 07 Apr 2025 11:50:57 -0400
From: Zi Yan <ziy@...dia.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
Tariq Toukan <tariqt@...dia.com>, Andrew Lunn <andrew+netdev@...n.ch>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Simon Horman <horms@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Mina Almasry <almasrymina@...gle.com>, Yonglong Liu <liuyonglong@...wei.com>,
Yunsheng Lin <linyunsheng@...wei.com>,
Pavel Begunkov <asml.silence@...il.com>,
Matthew Wilcox <willy@...radead.org>, netdev@...r.kernel.org,
bpf@...r.kernel.org, linux-rdma@...r.kernel.org, linux-mm@...ck.org,
kernel-team <kernel-team@...udflare.com>
Subject: Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into
helper functions
On 7 Apr 2025, at 10:43, Jesper Dangaard Brouer wrote:
> On 07/04/2025 16.15, Zi Yan wrote:
>> On 7 Apr 2025, at 9:36, Zi Yan wrote:
>>
>>> On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:
>>>
>>>> Zi Yan<ziy@...dia.com> writes:
>>>>
>>>>> Resend to fix my signature.
>>>>>
>>>>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>>>>
>>>>>> "Zi Yan"<ziy@...dia.com> writes:
>>>>>>
>>>>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>>>>> Since we are about to stash some more information into the pp_magic
>>>>>>>> field, let's move the magic signature checks into a pair of helper
>>>>>>>> functions so it can be changed in one place.
>>>>>>>>
>>>>>>>> Reviewed-by: Mina Almasry<almasrymina@...gle.com>
>>>>>>>> Tested-by: Yonglong Liu<liuyonglong@...wei.com>
>>>>>>>> Acked-by: Jesper Dangaard Brouer<hawk@...nel.org>
>>>>>>>> Reviewed-by: Ilias Apalodimas<ilias.apalodimas@...aro.org>
>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen<toke@...hat.com>
>>>>>>>> ---
>>>>>>>> drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 ++--
>>>>>>>> include/net/page_pool/types.h | 18 ++++++++++++++++++
>>>>>>>> mm/page_alloc.c | 9 +++------
>>>>>>>> net/core/netmem_priv.h | 5 +++++
>>>>>>>> net/core/skbuff.c | 16 ++--------------
>>>>>>>> net/core/xdp.c | 4 ++--
>>>>>>>> 6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>>>>
>>>>>>> <snip>
> [...]
>
>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -55,6 +55,7 @@
>>>>>>>> #include <linux/delayacct.h>
>>>>>>>> #include <linux/cacheinfo.h>
>>>>>>>> #include <linux/pgalloc_tag.h>
>>>>>>>> +#include <net/page_pool/types.h>
>>>>>>>> #include <asm/div64.h>
>>>>>>>> #include "internal.h"
>>>>>>>> #include "shuffle.h"
>>>>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>>>> #ifdef CONFIG_MEMCG
>>>>>>>> page->memcg_data |
>>>>>>>> #endif
>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>> - ((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>>>>> -#endif
>>>>>>>> + page_pool_page_is_pp(page) |
>>>>>>>> (page->flags & check_flags)))
>>>>>>>> return false;
>>>>>>>>
>>>>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>>> if (unlikely(page->memcg_data))
>>>>>>>> bad_reason = "page still charged to cgroup";
>>>>>>>> #endif
>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>> - if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>>>>> + if (unlikely(page_pool_page_is_pp(page)))
>>>>>>>> bad_reason = "page_pool leak";
>>>>>>>> -#endif
>>>>>>>> return bad_reason;
>>>>>>>> }
>>>>>>>>
>>>>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>>>>> net/page_pool.
>>>>>> Why? It's not really a dependency, just a header include with a static
>>>>>> inline function...
>>>>> The function is checking, not even modifying, an core mm data structure,
>>>>> struct page, which is also used by almost all subsystems. I do not get
>>>>> why the function is in net subsystem.
>>>> Well, because it's using details of the PP definitions, so keeping it
>>>> there nicely encapsulates things. I mean, that's the whole point of
>>>> defining a wrapper function - encapsulating the logic 🙂
>>>>
>>>>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>>>> That would require moving all the definitions introduced in patch 2,
>>>>>> which I don't think is appropriate.
The patch at the bottom moves page_pool_page_is_pp() to mm.h and compiles.
The macros and the function use mm’s page->pp_magic, so I am not sure
why it is appropriate, especially the user of the macros, net/core/page_pool.c,
has already included mm.h.
>>>>> Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
>>>>> in patch 2.
>>>> Look again. Patch 2 redefines PP_MAGIC_MASK in terms of all the other
>>>> definitions.
>>> OK. Just checked. Yes, the function depends on PP_MAGIC_MASK.
>>>
>>> But net/types.h has a lot of unrelated page_pool functions and data structures
>>> mm/page_alloc.c does not care about. Is there a way of moving page_pool_page_is_pp()
>>> and its dependency to a separate header and including that in mm/page_alloc.c?
>>>
>>> Looking at the use of page_pool_page_is_pp() in mm/page_alloc.c, it seems to be
>>> just error checking. Why can't page_pool do the error checking?
>>
>> Or just remove page_pool_page_is_pp() in mm/page_alloc.c. Has it really been used?
>
> We have actually used this at Cloudflare to catch some page_pool bugs.
> And this have been backported to our 6.1 and 6.6 kernels and we have
> enabled needed config CONFIG_DEBUG_VM (which we measured have low enough
> overhead to enable in production). AFAIK this is also enabled for our
> 6.12 kernels.
Got it. Thank you for the information.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b7f13f087954..a5c4dafcaa0f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4248,4 +4248,63 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#define VM_SEALED_SYSMAP VM_NONE
#endif
+/*
+ * DMA mapping IDs
+ *
+ * When DMA-mapping a page, we allocate an ID (from an xarray) and stash this in
+ * the upper bits of page->pp_magic. We always want to be able to unambiguously
+ * identify page pool pages (using page_pool_page_is_pp()). Non-PP pages can
+ * have arbitrary kernel pointers stored in the same field as pp_magic (since it
+ * overlaps with page->lru.next), so we must ensure that we cannot mistake a
+ * valid kernel pointer with any of the values we write into this field.
+ *
+ * On architectures that set POISON_POINTER_DELTA, this is already ensured,
+ * since this value becomes part of PP_SIGNATURE; meaning we can just use the
+ * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
+ * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
+ * 0, we make sure that we leave the two topmost bits empty, as that guarantees
+ * we won't mistake a valid kernel pointer for a value we set, regardless of the
+ * VMSPLIT setting.
+ *
+ * Altogether, this means that the number of bits available is constrained by
+ * the size of an unsigned long (at the upper end, subtracting two bits per the
+ * above), and the definition of PP_SIGNATURE (with or without
+ * POISON_POINTER_DELTA).
+ */
+#define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
+#if POISON_POINTER_DELTA > 0
+/* PP_SIGNATURE includes POISON_POINTER_DELTA, so limit the size of the DMA
+ * index to not overlap with that if set
+ */
+#define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
+#else
+/* Always leave out the topmost two; see above. */
+#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+#endif
+
+#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
+ PP_DMA_INDEX_SHIFT)
+#define PP_DMA_INDEX_LIMIT XA_LIMIT(1, BIT(PP_DMA_INDEX_BITS) - 1)
+
+/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
+ * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
+ * the head page of compound page and bit 1 for pfmemalloc page, as well as the
+ * bits used for the DMA index. page_is_pfmemalloc() is checked in
+ * __page_pool_put_page() to avoid recycling the pfmemalloc page.
+ */
+#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
+
+#ifdef CONFIG_PAGE_POOL
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+ return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
+}
+#else
+
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+ return false;
+}
+#endif
+
#endif /* _LINUX_MM_H */
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 5835d359ecd0..38ca7ac567cf 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -55,52 +55,6 @@ struct pp_alloc_cache {
netmem_ref cache[PP_ALLOC_CACHE_SIZE];
};
-/*
- * DMA mapping IDs
- *
- * When DMA-mapping a page, we allocate an ID (from an xarray) and stash this in
- * the upper bits of page->pp_magic. We always want to be able to unambiguously
- * identify page pool pages (using page_pool_page_is_pp()). Non-PP pages can
- * have arbitrary kernel pointers stored in the same field as pp_magic (since it
- * overlaps with page->lru.next), so we must ensure that we cannot mistake a
- * valid kernel pointer with any of the values we write into this field.
- *
- * On architectures that set POISON_POINTER_DELTA, this is already ensured,
- * since this value becomes part of PP_SIGNATURE; meaning we can just use the
- * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
- * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
- * 0, we make sure that we leave the two topmost bits empty, as that guarantees
- * we won't mistake a valid kernel pointer for a value we set, regardless of the
- * VMSPLIT setting.
- *
- * Altogether, this means that the number of bits available is constrained by
- * the size of an unsigned long (at the upper end, subtracting two bits per the
- * above), and the definition of PP_SIGNATURE (with or without
- * POISON_POINTER_DELTA).
- */
-#define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
-#if POISON_POINTER_DELTA > 0
-/* PP_SIGNATURE includes POISON_POINTER_DELTA, so limit the size of the DMA
- * index to not overlap with that if set
- */
-#define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
-#else
-/* Always leave out the topmost two; see above. */
-#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
-#endif
-
-#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
- PP_DMA_INDEX_SHIFT)
-#define PP_DMA_INDEX_LIMIT XA_LIMIT(1, BIT(PP_DMA_INDEX_BITS) - 1)
-
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
/**
* struct page_pool_params - page pool parameters
* @fast: params accessed frequently on hotpath
@@ -314,10 +268,6 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
const struct xdp_mem_info *mem);
void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
-static inline bool page_pool_page_is_pp(struct page *page)
-{
- return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
#else
static inline void page_pool_destroy(struct page_pool *pool)
{
@@ -333,10 +283,6 @@ static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
{
}
-static inline bool page_pool_page_is_pp(struct page *page)
-{
- return false;
-}
#endif
void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b14f292da3db..a18340b32218 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,7 +55,6 @@
#include <linux/delayacct.h>
#include <linux/cacheinfo.h>
#include <linux/pgalloc_tag.h>
-#include <net/page_pool/types.h>
#include <asm/div64.h>
#include "internal.h"
#include "shuffle.h"
Best Regards,
Yan, Zi
Powered by blists - more mailing lists