linux-kernel - Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e8c134ce-3253-414d-855a-bb1714402df2@gmx.com>
Date: Thu, 30 May 2024 14:56:19 +0930
From: Qu Wenruo <quwenruo.btrfs@....com>
To: David Hildenbrand <david@...hat.com>,
 Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>, Chris Mason <clm@...com>,
 Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>
Cc: Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
 Linux Memory Management List <linux-mm@...ck.org>,
 Matthew Wilcox <willy@...radead.org>,
 linux-btrfs <linux-btrfs@...r.kernel.org>
Subject: Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840



在 2024/5/30 08:07, Qu Wenruo 写道:
>
>
> 在 2024/5/29 16:27, David Hildenbrand 写道:
>>
>> A little bird just told me that I missed an important piece in the dmesg
>> output: "aops:btree_aops ino:1" from dump_mapping():
>>
>> This is btrfs, i_ino is 1, and we don't have a dentry. Is that
>> BTRFS_BTREE_INODE_OBJECTID?
>>
>> Summarizing what we know so far:
>> (1) Freeing an order-0 btrfs folio where folio->mapping
>>      is still set
>> (2) Triggered by kswapd and kcompactd; not triggered by other means of
>>      page freeing so far
>
>  From the implementation of filemap_migrate_folio() (and previous
> migrate_page_moving_mapping()), it looks like the migration only involves:
>
> - Migrate the mapping
> - Copy the page private value
> - Copy the contents (if needed)
> - Copy all the page flags
>
> The most recent touch on migration is from v6.0, which I do not believe
> is the cause at all.
>
>>
>> Possible theories:
>> (A) folio->mapping not cleared when freeing the folio. But shouldn't
>>      this also happen on other freeing paths? Or are we simply lucky to
>>      never trigger that for that folio?
>
> Yeah, in fact we never manually clean folio->mapping inside btrfs, thus
> I'm not sure if it's the case.
>
>> (B) Messed-up refcounting: freeing a folio that is still in use (and
>>      therefore has folio-> mapping still set)
>>
>> I was briefly wondering if large folio splitting could be involved.
>
> Although we have all the metadata support for larger folios, we do not
> yet enable it.

After some extra code digging and tons of trace_printk(), it indeed
looks like btrfs is underflowing the folio ref count.

During the lifespan of an extent buffer (btrfs' metadata), it should at
least has 3 refs after attached to the address space:

1) folio_alloc() inside btrfs_alloc_folio_array()
2) folio_ref_add() inside __filemap_add_folio()
3) folio_add_lru() inside filemap_add_folio()

Even if btrfs wants to release the folio of an eb, we only do:

- Detach the folio::private
- put_folio()

So even if an eb got released, as long as it is not yet detached from
filemap, its refcount should still be >= 2.

Thus the warning is indeed correct, by somehow btrfs called extra
put_folio() on the eb page which is already attached to the btree inode.

I'll continue digging around the eb folio refs inside btrfs, meanwhile I
will also test some extra checks for eb folios on their refcount.

Thanks,
Qu