linux-kernel - Re: [Regression 6.12] NULL pointer dereference in submit_bio_noacct via backing_file_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxggQekxqavkt+RiJd9s9cdDgXZuVfQrL_qNciBNf=4Lww@mail.gmail.com>
Date: Wed, 14 Jan 2026 12:10:51 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Chenglong Tang <chenglongtang@...gle.com>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, Jan Kara <jack@...e.cz>, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	miklos@...redi.hu
Subject: Re: [Regression 6.12] NULL pointer dereference in submit_bio_noacct
 via backing_file_read_iter

On Wed, Jan 14, 2026 at 1:53 AM Chenglong Tang <chenglongtang@...gle.com> wrote:
>
> Hi OverlayFS Maintainers,
>
> This is from Container Optimized OS in Google Cloud.
>
> We are reporting a reproducible kernel panic on Kernel 6.12 involving
> a NULL pointer dereference in submit_bio_noacct.
>
> The Issue: The panic occurs intermittently (approx. 5 failures in 1000
> runs) during a specific PostgreSQL client test
> (postgres_client_test_postgres15_ctrdncsa) on Google
> Container-Optimized OS. The stack trace shows the crash happens when
> IMA (ima_calc_file_hash) attempts to read a file from OverlayFS via
> the new-in-6.12 backing_file_read_iter helper.
>
> It appears to be a race condition where the underlying block device is
> detached (becoming NULL) while the backing_file wrapper is still
> attempting to submit a read bio during container teardown.
>
> Stack Trace:
> [  OK  ] Started    75.793015] BUG: kernel NULL pointer dereference,
> address: 0000000000000156
> [   75.822539] #PF: supervisor read access in kernel mode
> [   75.849332] #PF: error_code(0x0000) - not-present page
> [   75.862775] PGD 7d012067 P4D 7d012067 PUD 7d013067 PMD 0
> [   75.884283] Oops: Oops: 0000 [#1] SMP NOPTI
> [   75.902274] CPU: 1 UID: 0 PID: 6476 Comm: helmd Tainted: G
>  O       6.12.55+ #1
> [   75.928903] Tainted: [O]=OOT_MODULE
> [   75.942484] Hardware name: Google Google Compute Engine/Google
> Compute Engine, BIOS Google 01/01/2011
> [   75.965868] RIP: 0010:submit_bio_noacct+0x21d/0x470
> [   75.978340] Code: 8b 73 48 4d 85 f6 74 55 4c 63 25 b6 ad 89 01 49
> 83 fc 06 0f 83 44 02 00 00 4f 8b a4 e6 d0 00 00 00 83 3d 09 c9 7d 01
> 00 7e 3f <43> 80 bc 3c 56 01 00 00 00 0f 84 28 01 00 00 48 89 df e8 4c
> a0 02
> [   76.035847] RSP: 0018:ffffa41183463880 EFLAGS: 00010202
> [   76.050141] RAX: ffff9d4ec1a81a78 RBX: ffff9d4f3811e6c0 RCX: 00000000009410a0
> [   76.065176] RDX: 0000000010300001 RSI: ffff9d4ec1a81a78 RDI: ffff9d4f3811e6c0
> [   76.089292] RBP: ffffa411834638b0 R08: 0000000000001000 R09: ffff9d4f3811e6c0
> [   76.110878] R10: 2000000000000000 R11: ffffffff8a33e700 R12: 0000000000000000
> [   76.139068] R13: ffff9d4ec1422bc0 R14: ffff9d4ec2507000 R15: 0000000000000000
> [   76.168391] FS:  0000000008df7f40(0000) GS:ffff9d4f3dd00000(0000)
> knlGS:0000000000000000
> [   76.179024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   76.184951] CR2: 0000000000000156 CR3: 000000007d01c006 CR4: 0000000000370ef0
> [   76.192352] Call Trace:
> [   76.194981]  <TASK>
> [   76.197257]  ext4_mpage_readpages+0x75c/0x790
> [   76.201794]  read_pages+0xa0/0x250
> [   76.205373]  page_cache_ra_unbounded+0xa2/0x1c0
> [   76.232608]  filemap_get_pages+0x16b/0x7a0
> [   76.254151]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   76.260523]  filemap_read+0xf6/0x440
> [   76.264540]  do_iter_readv_writev+0x17e/0x1c0
> [   76.275427]  vfs_iter_read+0x8a/0x140
> [   76.279272]  backing_file_read_iter+0x155/0x250
> [   76.284425]  ovl_read_iter+0xd7/0x120
> [   76.288270]  ? __pfx_ovl_file_accessed+0x10/0x10
> [   76.293069]  vfs_read+0x2b1/0x300
> [   76.296835]  ksys_read+0x75/0xe0
> [   76.300246]  do_syscall_64+0x61/0x130
> [   76.304173]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Our Findings:
>
> Not an Ext4 regression: We verified that reverting "ext4: reduce stack
> usage in ext4_mpage_readpages()" does not resolve the panic.
>
> Suspected Fix: We suspect upstream commit 18e48d0e2c7b ("ovl: store
> upper real file in ovl_file struct") is the correct fix. It seems to
> address this exact lifetime race by persistently pinning the
> underlying file.

That sounds odd.
Using a persistent upper real file may be more efficient than opening
a temporary file for every read, but the temporary file is a legit opened file,
so it looks like you would be averting the race rather than fixing it.

Could you try to analyse the conditions that caused the race?

>
> The Problem: We cannot apply 18e48d0e2c7b to 6.12 stable because it
> depends on the extensive ovl_real_file refactoring series (removing
> ovl_real_fdget family functions) that landed in 6.13.
>
> Is there a recommended way to backport the "persistent real file"
> logic to 6.12 without pulling in the entire refactor chain?
>

These are the commits in overlayfs/file.c v6.12..v6.13:

$ git log --oneline  v6.12..v6.13 -- fs/overlayfs/file.c
d66907b51ba07 ovl: convert ovl_real_fdget() callers to ovl_real_file()
4333e42ed4444 ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
18e48d0e2c7b1 ovl: store upper real file in ovl_file struct
87a8a76c34a2a ovl: allocate a container struct ovl_file for ovl private context
c2c54b5f34f63 ovl: do not open non-data lower file for fsync
fc5a1d2287bf2 ovl: use wrapper ovl_revert_creds()
48b50624aec45 backing-file: clean up the API

Your claim that 18e48d0e2c7b depends on ovl_real_fdget() is incorrect.
You may safely cherry-pick the 4 commits above leading to 18e48d0e2c7b1.
They are all self contained changes that would be good to have in 6.12.y,
because they would make cherry-picking future fixes easier.

Specifically, backing-file: clean up the API, it is better to have the same
API in upstream and stable kernels.

Thanks,
Amir.