[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <874iqrod4x.fsf@toke.dk>
Date: Tue, 18 Nov 2025 11:20:46 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, Byungchul Park
<byungchul@...com>, "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: linux-mm@...ck.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel_team@...ynix.com,
harry.yoo@...cle.com, ast@...nel.org, daniel@...earbox.net,
davem@...emloft.net, kuba@...nel.org, john.fastabend@...il.com,
sdf@...ichev.me, saeedm@...dia.com, leon@...nel.org, tariqt@...dia.com,
mbloch@...dia.com, andrew+netdev@...n.ch, edumazet@...gle.com,
pabeni@...hat.com, akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org,
surenb@...gle.com, mhocko@...e.com, horms@...nel.org, jackmanb@...gle.com,
hannes@...xchg.org, ziy@...dia.com, ilias.apalodimas@...aro.org,
willy@...radead.org, brauner@...nel.org, kas@...nel.org,
yuzhao@...gle.com, usamaarif642@...il.com, baolin.wang@...ux.alibaba.com,
almasrymina@...gle.com, asml.silence@...il.com, bpf@...r.kernel.org,
linux-rdma@...r.kernel.org, sfr@...b.auug.org.au, dw@...idwei.uk,
ap420073@...il.com, dtatulea@...dia.com
Subject: Re: [RFC mm v6] mm: introduce a new page type for page pool in page
type
Jesper Dangaard Brouer <hawk@...nel.org> writes:
> On 18/11/2025 02.18, Byungchul Park wrote:
>> On Tue, Nov 18, 2025 at 10:07:35AM +0900, Byungchul Park wrote:
>>> On Mon, Nov 17, 2025 at 05:47:05PM +0100, David Hildenbrand (Red Hat) wrote:
>>>> On 17.11.25 17:02, Jesper Dangaard Brouer wrote:
>>>>>
>>>>> On 17/11/2025 06.20, Byungchul Park wrote:
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index 600d9e981c23..01dd14123065 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -1041,7 +1041,6 @@ static inline bool page_expected_state(struct page *page,
>>>>>> #ifdef CONFIG_MEMCG
>>>>>> page->memcg_data |
>>>>>> #endif
>>>>>> - page_pool_page_is_pp(page) |
>>>>>> (page->flags.f & check_flags)))
>>>>>> return false;
>>>>>>
>>>>>> @@ -1068,8 +1067,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>> if (unlikely(page->memcg_data))
>>>>>> bad_reason = "page still charged to cgroup";
>>>>>> #endif
>>>>>> - if (unlikely(page_pool_page_is_pp(page)))
>>>>>> - bad_reason = "page_pool leak";
>>>>>> return bad_reason;
>>>>>> }
>>>>>
>>>>> This code have helped us catch leaks in the past.
>>>>> When this happens the result is that the page is marked as a bad page.
>>>>>
>>>>>>
>>>>>> @@ -1378,9 +1375,12 @@ __always_inline bool free_pages_prepare(struct page *page,
>>>>>> mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>>>>>> folio->mapping = NULL;
>>>>>> }
>>>>>> - if (unlikely(page_has_type(page)))
>>>>>> + if (unlikely(page_has_type(page))) {
>>>>>> + /* networking expects to clear its page type before releasing */
>>>>>> + WARN_ON_ONCE(PageNetpp(page));
>>>>>> /* Reset the page_type (which overlays _mapcount) */
>>>>>> page->page_type = UINT_MAX;
>>>>>> + }
>>>>>>
>>>>>> if (is_check_pages_enabled()) {
>>>>>> if (free_page_is_bad(page))
>>>>>
>>>>> What happens to the page? ... when it gets marked with:
>>>>> page->page_type = UINT_MAX
>>>>>
>>>>> Will it get freed and allowed to be used by others?
>>>>> - if so it can result in other hard-to-catch bugs
>>>>
>>>> Yes, just like most other use-after-free from any other subsystem in the
>>>> kernel :)
>>>>
>>>> The expectation is that such BUGs are found early during testing
>>>> (triggering a WARN) such that they can be fixed early.
>>>>
>>>> But we could also report a bad page here and just stop (return false).
>
> I agree, that we want to catch these bugs early by triggering a WARN.
> The bad_page() call also triggers dump_stack() and have a burst limiter,
> which I like. We are running with CONFIG_DEBUG_VM=y in production (as
> the measured overhead was minimal) to monitor these kind of leaks.
>
> For the case with page_pool, we *could* recover more gracefully, by
> returning the page to the page_pool (page->pp) instance. But I'm
> reluctant to taking this path, as that puts less pressure on fixing the
> leak as we "recovered", as this becomes are warning and not a bug.
> Opinions are welcomed, should we recover or do bad_page() ?
I think we should do bad_page() to get the bugs fixed :)
-Toke
Powered by blists - more mailing lists