linux-kernel - Re: [GIT PULL] Btrfs updates for 6.10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <32730052-b40e-4262-a1c4-0d45a9b6de53@gmx.com>
Date: Thu, 16 May 2024 18:31:57 +0930
From: Qu Wenruo <quwenruo.btrfs@....com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
 David Sterba <dsterba@...e.com>
Cc: linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] Btrfs updates for 6.10



在 2024/5/16 10:01, Linus Torvalds 写道:
> On Mon, 13 May 2024 at 09:28, David Sterba <dsterba@...e.com> wrote:
>>
>>    git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag
>
> So I initially blamed a GPU driver for the following problem, but Dave
> Airlie seems to think it's unlikely that problem would cause this kind
> of corruption, so now it looks like it might just be btrfs itself:
>
>    BUG: Bad page state in process kworker/u261:13  pfn:31fb9a
>    page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
> pfn:0x31fb9a
>    aops:btree_aops ino:1
>    flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
>    page_type: 0xffffffff()
>    raw: 02fffc600000020c dead000000000100 dead000000000122 ffff9b191efb0338
>    raw: 0000000000037ce8 0000000000000000 00000000ffffffff 0000000000000000
>    page dumped because: non-NULL mapping
>    CPU: 18 PID: 141351 Comm: kworker/u261:13 Tainted: G        W
>    6.9.0-07381-g3860ca371740 #60
>    Workqueue: btrfs-delayed-meta btrfs_work_helper
>    Call Trace:
>     bad_page+0xe0/0xf0
>     free_unref_page_prepare+0x363/0x380
>     ? __count_memcg_events+0x63/0xd0
>     free_unref_page+0x33/0x1f0
>     ? __mem_cgroup_uncharge+0x80/0xb0
>     __folio_put+0x62/0x80
>     release_extent_buffer+0xad/0x110
>     btrfs_force_cow_block+0x68f/0x890
>     btrfs_cow_block+0xe5/0x240
>     btrfs_search_slot+0x30e/0x9f0
>     btrfs_lookup_inode+0x31/0xb0
>     __btrfs_update_delayed_inode+0x5c/0x350
>     ? kfree+0x80/0x250
>     __btrfs_commit_inode_delayed_items+0x7a1/0x7d0
>     btrfs_async_run_delayed_root+0xf7/0x1b0
>     btrfs_work_helper+0xc0/0x320
>     process_scheduled_works+0x196/0x360
>     worker_thread+0x2b8/0x370
>     ? pr_cont_work+0x190/0x190
>     kthread+0x111/0x120
>     ? kthread_blkcg+0x30/0x30
>     ret_from_fork+0x30/0x40
>     ? kthread_blkcg+0x30/0x30
>     ret_from_fork_asm+0x11/0x20
>
> Note the line
>
>      page dumped because: non-NULL mapping
>
> but the actual mapping pointer isn't a valid kernel pointer. I suspect
> that may be due to pointer hashing, though. I'm not convinced that's a
> great idea for this case, but hey, here we are. Sometimes those "don't
> leak kernel pointers" things cause problems for debugging.
>
> Anyway, it looks like the btrfs_cow_block -> btrfs_force_cow_block ->
> release_extent_buffer -> __folio_put path might be releasing a page
> that is still attached to a mapping. Perhaps some page counting
> imbalance?
>
> This all happened under fairly normal - for me - workstation loads. I
> was (of course) doing an allmodconfig kernel build after a pull, and I
> had a handful of terminals and the web browser open. Nothing
> particularly interesting or odd.

Considering aarch64 is going more and more common, is the workstation
also an aarch64 platform? (the Ampere one?)
If so, mind to share the page size and the fs sectorsize?
That would at least help us to know if it's the subpage routine or the
regular routine.

Thanks,
Qu

>
> Does the above make any btrfs people go "Ahh, I see how that would be
> a problem"?
>
>              Linus
>