linux-kernel - Re: [RFC mm v6] mm: introduce a new page type for page pool in page type

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <874iqrod4x.fsf@toke.dk>
Date: Tue, 18 Nov 2025 11:20:46 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, Byungchul Park
 <byungchul@...com>, "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: linux-mm@...ck.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, kernel_team@...ynix.com,
 harry.yoo@...cle.com, ast@...nel.org, daniel@...earbox.net,
 davem@...emloft.net, kuba@...nel.org, john.fastabend@...il.com,
 sdf@...ichev.me, saeedm@...dia.com, leon@...nel.org, tariqt@...dia.com,
 mbloch@...dia.com, andrew+netdev@...n.ch, edumazet@...gle.com,
 pabeni@...hat.com, akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com,
 Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org,
 surenb@...gle.com, mhocko@...e.com, horms@...nel.org, jackmanb@...gle.com,
 hannes@...xchg.org, ziy@...dia.com, ilias.apalodimas@...aro.org,
 willy@...radead.org, brauner@...nel.org, kas@...nel.org,
 yuzhao@...gle.com, usamaarif642@...il.com, baolin.wang@...ux.alibaba.com,
 almasrymina@...gle.com, asml.silence@...il.com, bpf@...r.kernel.org,
 linux-rdma@...r.kernel.org, sfr@...b.auug.org.au, dw@...idwei.uk,
 ap420073@...il.com, dtatulea@...dia.com
Subject: Re: [RFC mm v6] mm: introduce a new page type for page pool in page
 type

Jesper Dangaard Brouer <hawk@...nel.org> writes:

> On 18/11/2025 02.18, Byungchul Park wrote:
>> On Tue, Nov 18, 2025 at 10:07:35AM +0900, Byungchul Park wrote:
>>> On Mon, Nov 17, 2025 at 05:47:05PM +0100, David Hildenbrand (Red Hat) wrote:
>>>> On 17.11.25 17:02, Jesper Dangaard Brouer wrote:
>>>>>
>>>>> On 17/11/2025 06.20, Byungchul Park wrote:
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index 600d9e981c23..01dd14123065 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -1041,7 +1041,6 @@ static inline bool page_expected_state(struct page *page,
>>>>>>     #ifdef CONFIG_MEMCG
>>>>>>                       page->memcg_data |
>>>>>>     #endif
>>>>>> -                    page_pool_page_is_pp(page) |
>>>>>>                       (page->flags.f & check_flags)))
>>>>>>               return false;
>>>>>>
>>>>>> @@ -1068,8 +1067,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>       if (unlikely(page->memcg_data))
>>>>>>               bad_reason = "page still charged to cgroup";
>>>>>>     #endif
>>>>>> -    if (unlikely(page_pool_page_is_pp(page)))
>>>>>> -            bad_reason = "page_pool leak";
>>>>>>       return bad_reason;
>>>>>>     }
>>>>>
>>>>> This code have helped us catch leaks in the past.
>>>>> When this happens the result is that the page is marked as a bad page.
>>>>>
>>>>>>
>>>>>> @@ -1378,9 +1375,12 @@ __always_inline bool free_pages_prepare(struct page *page,
>>>>>>               mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>>>>>>               folio->mapping = NULL;
>>>>>>       }
>>>>>> -    if (unlikely(page_has_type(page)))
>>>>>> +    if (unlikely(page_has_type(page))) {
>>>>>> +            /* networking expects to clear its page type before releasing */
>>>>>> +            WARN_ON_ONCE(PageNetpp(page));
>>>>>>               /* Reset the page_type (which overlays _mapcount) */
>>>>>>               page->page_type = UINT_MAX;
>>>>>> +    }
>>>>>>
>>>>>>       if (is_check_pages_enabled()) {
>>>>>>               if (free_page_is_bad(page))
>>>>>
>>>>> What happens to the page? ... when it gets marked with:
>>>>>      page->page_type = UINT_MAX
>>>>>
>>>>> Will it get freed and allowed to be used by others?
>>>>> - if so it can result in other hard-to-catch bugs
>>>>
>>>> Yes, just like most other use-after-free from any other subsystem in the
>>>> kernel :)
>>>>
>>>> The expectation is that such BUGs are found early during testing
>>>> (triggering a WARN) such that they can be fixed early.
>>>>
>>>> But we could also report a bad page here and just stop (return false).
>
> I agree, that we want to catch these bugs early by triggering a WARN.
> The bad_page() call also triggers dump_stack() and have a burst limiter,
> which I like.  We are running with CONFIG_DEBUG_VM=y in production (as
> the measured overhead was minimal) to monitor these kind of leaks.
>
> For the case with page_pool, we *could* recover more gracefully, by
> returning the page to the page_pool (page->pp) instance.  But I'm
> reluctant to taking this path, as that puts less pressure on fixing the
> leak as we "recovered", as this becomes are warning and not a bug.
> Opinions are welcomed, should we recover or do bad_page() ?

I think we should do bad_page() to get the bugs fixed :)

-Toke