[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d5acfea4-10e6-4843-b67b-b23bdeb38dad@gmx.de>
Date: Fri, 12 Sep 2025 16:04:13 +0200
From: Helge Deller <deller@....de>
To: David Hildenbrand <david@...hat.com>, Helge Deller <deller@...nel.org>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Linux Kernel Development <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
linux-parisc <linux-parisc@...r.kernel.org>
Cc: Christoph Biedl <linux-kernel.bfrz@...chmal.in-ulm.de>,
Byungchul Park <byungchul@...com>
Subject: Re: boot failure because of inaccurate page_pool_page_is_pp() on
32-bit kernels
On 9/12/25 09:57, David Hildenbrand wrote:
> On 12.09.25 00:12, Helge Deller wrote:
>> As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16
>> fail to boot on the parisc architecture like this:
>>
>> BUG: Bad page state in process swapper pfn:000f7
>> page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7
>> flags: 0x0(zone=0)
>> raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000
>> raw: 00000000
>> page dumped because: page_pool leak
>> Modules linked in:
>> CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE
>> Hardware name: 9000/778/B160L
>> Backtrace:
>> [<106ece88>] bad_page+0x14c/0x17c
>> [<10406c50>] free_page_is_bad.part.0+0xd4/0xec
>> [<106ed180>] free_page_is_bad+0x80/0x88
>> [<106ef05c>] __free_pages_ok+0x374/0x508
>> [<1011d34c>] __free_pages_core+0x1f0/0x218
>> [<1011a2f0>] memblock_free_pages+0x68/0x94
>> [<10120324>] memblock_free_all+0x26c/0x310
>> [<1011a4d8>] mm_core_init+0x18c/0x208
>> [<10100e88>] start_kernel+0x4ec/0x7a0
>> [<101054d0>] start_parisc+0xb4/0xc4
>>
>> git bisecting leads to this patch which triggers the crash:
>>
>> commit ee62ce7a1d909ccba0399680a03c2dee83bcae95
>> Author: Toke Høiland-Jørgensen <toke@...hat.com>
>> Date: Wed Apr 9 12:41:37 2025 +0200
>> page_pool: Track DMA-mapped pages and unmap them when destroying the pool
>>
>> It turns out that the patch itself isn't wrong.
>>
>> But it's the culprit which leads to the kernel bug since it modifies
>> PP_MAGIC_MASK for 32-bit kernels from:
>>
>> -#define PP_MAGIC_MASK ~0x3UL
>> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>
>> Function page_pool_page_is_pp() needs to unambiguously identify page pool
>> pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to
>> check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are
>> not sufficient to unambiguously identify such pages any longer.
>>
>> Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as
>> page pool pages and as such triggers the kernel BUG as it believes it found a
>> page pool leak.
>>
>> IMHO this is a generic 32-bit kernel issue, not just affecting parisc.
>>
>> Do you see any options other than:
>> a) revert the patch (ee62ce7a1d90), or:
>> b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT),
>> which means to effectively disable the page pool page test on 32bit
>> machines
>
> We should have a change coming soon that would use a page type and fix it as well I think.
>
> https://lkml.kernel.org/r/20250728052742.81294-1-byungchul@sk.com
>
> Until then, the easiest fix would be indeed to go with b).
Ok, I'll send a patch for b).
Thanks!
Helge
Powered by blists - more mailing lists