[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03eb13ad-03a2-4982-9545-0a5506e043d0@suse.cz>
Date: Fri, 7 Feb 2025 11:25:46 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Matthew Wilcox <willy@...radead.org>, Miklos Szeredi <miklos@...redi.hu>
Cc: Christian Heusel <christian@...sel.eu>, Josef Bacik
<josef@...icpanda.com>, Miklos Szeredi <mszeredi@...hat.com>,
regressions@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Joanne Koong <joannelkoong@...il.com>,
linux-mm <linux-mm@...ck.org>
Subject: Re: [REGRESSION][BISECTED] Crash with Bad page state for FUSE/Flatpak
related applications since v6.13
On 2/7/25 10:45, Matthew Wilcox wrote:
> On Fri, Feb 07, 2025 at 10:34:52AM +0100, Miklos Szeredi wrote:
>> Seems like page allocation gets an inconsistent page (mapcount != -1)
>> in the report below.
>
> I think you're misreading the report. _mapcount is -1. Which means
> mapcount is 0.
>
>> > Feb 06 08:54:47 archvm kernel: BUG: Bad page state in process rnote pfn:67587
>> > Feb 06 08:54:47 archvm kernel: page: refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x67587
refcount of -1 doesn't look healthy too, should be 0 at this point?
>> > Feb 06 08:54:47 archvm kernel: flags: 0xfffffc8000020(lru|node=0|zone=1|lastcpupid=0x1fffff)
>> > Feb 06 08:54:47 archvm kernel: raw: 000fffffc8000020 dead000000000100 dead000000000122 0000000000000000
>
> flags lru.next lru.prev mapping
>
>> > Feb 06 08:54:47 archvm kernel: raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
>
> index private mapcount:refcount memcg_data
>
>> > Feb 06 08:54:47 archvm kernel: page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
>
> So the problem is the lru flag is set.
>
>> > Feb 06 08:54:47 archvm kernel: dump_stack_lvl+0x5d/0x80
>> > Feb 06 08:54:47 archvm kernel: bad_page.cold+0x7a/0x91
>> > Feb 06 08:54:47 archvm kernel: __rmqueue_pcplist+0x200/0xc50
>> > Feb 06 08:54:47 archvm kernel: get_page_from_freelist+0x2ae/0x1740
>> > Feb 06 08:54:47 archvm kernel: __alloc_frozen_pages_noprof+0x184/0x330
>> > Feb 06 08:54:47 archvm kernel: alloc_pages_mpol+0x7d/0x160
>> > Feb 06 08:54:47 archvm kernel: folio_alloc_mpol_noprof+0x14/0x40
>> > Feb 06 08:54:47 archvm kernel: vma_alloc_folio_noprof+0x69/0xb0
>> > Feb 06 08:54:47 archvm kernel: do_anonymous_page+0x32a/0x8b0
>
> It's very weird, because PG_lru is also in PAGE_FLAGS_CHECK_AT_FREE.
> So it should already have been checked and not be set. I'm on holiday
Could be a use-after free of the page, which sets PG_lru again. The list
corruptions in __rmqueue_pcplist also suggest some page manipulation after
free. The -1 refcount suggests somebody was using the page while it was
freed due to refcount dropping to 0 and then did a put_page()?
> until Monday, so I'm not going to dive into this any further.
Powered by blists - more mailing lists