linux-kernel - Re: [GIT PULL] Btrfs updates for 6.10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com>
Date: Wed, 15 May 2024 17:31:30 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Sterba <dsterba@...e.com>
Cc: linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] Btrfs updates for 6.10

On Mon, 13 May 2024 at 09:28, David Sterba <dsterba@...e.com> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag

So I initially blamed a GPU driver for the following problem, but Dave
Airlie seems to think it's unlikely that problem would cause this kind
of corruption, so now it looks like it might just be btrfs itself:

  BUG: Bad page state in process kworker/u261:13  pfn:31fb9a
  page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
pfn:0x31fb9a
  aops:btree_aops ino:1
  flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
  page_type: 0xffffffff()
  raw: 02fffc600000020c dead000000000100 dead000000000122 ffff9b191efb0338
  raw: 0000000000037ce8 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: non-NULL mapping
  CPU: 18 PID: 141351 Comm: kworker/u261:13 Tainted: G        W
  6.9.0-07381-g3860ca371740 #60
  Workqueue: btrfs-delayed-meta btrfs_work_helper
  Call Trace:
   bad_page+0xe0/0xf0
   free_unref_page_prepare+0x363/0x380
   ? __count_memcg_events+0x63/0xd0
   free_unref_page+0x33/0x1f0
   ? __mem_cgroup_uncharge+0x80/0xb0
   __folio_put+0x62/0x80
   release_extent_buffer+0xad/0x110
   btrfs_force_cow_block+0x68f/0x890
   btrfs_cow_block+0xe5/0x240
   btrfs_search_slot+0x30e/0x9f0
   btrfs_lookup_inode+0x31/0xb0
   __btrfs_update_delayed_inode+0x5c/0x350
   ? kfree+0x80/0x250
   __btrfs_commit_inode_delayed_items+0x7a1/0x7d0
   btrfs_async_run_delayed_root+0xf7/0x1b0
   btrfs_work_helper+0xc0/0x320
   process_scheduled_works+0x196/0x360
   worker_thread+0x2b8/0x370
   ? pr_cont_work+0x190/0x190
   kthread+0x111/0x120
   ? kthread_blkcg+0x30/0x30
   ret_from_fork+0x30/0x40
   ? kthread_blkcg+0x30/0x30
   ret_from_fork_asm+0x11/0x20

Note the line

    page dumped because: non-NULL mapping

but the actual mapping pointer isn't a valid kernel pointer. I suspect
that may be due to pointer hashing, though. I'm not convinced that's a
great idea for this case, but hey, here we are. Sometimes those "don't
leak kernel pointers" things cause problems for debugging.

Anyway, it looks like the btrfs_cow_block -> btrfs_force_cow_block ->
release_extent_buffer -> __folio_put path might be releasing a page
that is still attached to a mapping. Perhaps some page counting
imbalance?

This all happened under fairly normal - for me - workstation loads. I
was (of course) doing an allmodconfig kernel build after a pull, and I
had a handful of terminals and the web browser open. Nothing
particularly interesting or odd.

Does the above make any btrfs people go "Ahh, I see how that would be
a problem"?

            Linus