lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOdxtTZ=SuV2GMPuqQJe6h-h-CDiG5yBW+07f1QYEw+kTA4-2w@mail.gmail.com>
Date: Tue, 13 Jan 2026 16:53:47 -0800
From: Chenglong Tang <chenglongtang@...gle.com>
To: viro@...iv.linux.org.uk, brauner@...nel.org, Jan Kara <jack@...e.cz>, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	amir73il@...il.com, miklos@...redi.hu
Subject: [Regression 6.12] NULL pointer dereference in submit_bio_noacct via backing_file_read_iter

Hi OverlayFS Maintainers,

This is from Container Optimized OS in Google Cloud.

We are reporting a reproducible kernel panic on Kernel 6.12 involving
a NULL pointer dereference in submit_bio_noacct.

The Issue: The panic occurs intermittently (approx. 5 failures in 1000
runs) during a specific PostgreSQL client test
(postgres_client_test_postgres15_ctrdncsa) on Google
Container-Optimized OS. The stack trace shows the crash happens when
IMA (ima_calc_file_hash) attempts to read a file from OverlayFS via
the new-in-6.12 backing_file_read_iter helper.

It appears to be a race condition where the underlying block device is
detached (becoming NULL) while the backing_file wrapper is still
attempting to submit a read bio during container teardown.

Stack Trace:
[  OK  ] Started    75.793015] BUG: kernel NULL pointer dereference,
address: 0000000000000156
[   75.822539] #PF: supervisor read access in kernel mode
[   75.849332] #PF: error_code(0x0000) - not-present page
[   75.862775] PGD 7d012067 P4D 7d012067 PUD 7d013067 PMD 0
[   75.884283] Oops: Oops: 0000 [#1] SMP NOPTI
[   75.902274] CPU: 1 UID: 0 PID: 6476 Comm: helmd Tainted: G
 O       6.12.55+ #1
[   75.928903] Tainted: [O]=OOT_MODULE
[   75.942484] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[   75.965868] RIP: 0010:submit_bio_noacct+0x21d/0x470
[   75.978340] Code: 8b 73 48 4d 85 f6 74 55 4c 63 25 b6 ad 89 01 49
83 fc 06 0f 83 44 02 00 00 4f 8b a4 e6 d0 00 00 00 83 3d 09 c9 7d 01
00 7e 3f <43> 80 bc 3c 56 01 00 00 00 0f 84 28 01 00 00 48 89 df e8 4c
a0 02
[   76.035847] RSP: 0018:ffffa41183463880 EFLAGS: 00010202
[   76.050141] RAX: ffff9d4ec1a81a78 RBX: ffff9d4f3811e6c0 RCX: 00000000009410a0
[   76.065176] RDX: 0000000010300001 RSI: ffff9d4ec1a81a78 RDI: ffff9d4f3811e6c0
[   76.089292] RBP: ffffa411834638b0 R08: 0000000000001000 R09: ffff9d4f3811e6c0
[   76.110878] R10: 2000000000000000 R11: ffffffff8a33e700 R12: 0000000000000000
[   76.139068] R13: ffff9d4ec1422bc0 R14: ffff9d4ec2507000 R15: 0000000000000000
[   76.168391] FS:  0000000008df7f40(0000) GS:ffff9d4f3dd00000(0000)
knlGS:0000000000000000
[   76.179024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   76.184951] CR2: 0000000000000156 CR3: 000000007d01c006 CR4: 0000000000370ef0
[   76.192352] Call Trace:
[   76.194981]  <TASK>
[   76.197257]  ext4_mpage_readpages+0x75c/0x790
[   76.201794]  read_pages+0xa0/0x250
[   76.205373]  page_cache_ra_unbounded+0xa2/0x1c0
[   76.232608]  filemap_get_pages+0x16b/0x7a0
[   76.254151]  ? srso_alias_return_thunk+0x5/0xfbef5
[   76.260523]  filemap_read+0xf6/0x440
[   76.264540]  do_iter_readv_writev+0x17e/0x1c0
[   76.275427]  vfs_iter_read+0x8a/0x140
[   76.279272]  backing_file_read_iter+0x155/0x250
[   76.284425]  ovl_read_iter+0xd7/0x120
[   76.288270]  ? __pfx_ovl_file_accessed+0x10/0x10
[   76.293069]  vfs_read+0x2b1/0x300
[   76.296835]  ksys_read+0x75/0xe0
[   76.300246]  do_syscall_64+0x61/0x130
[   76.304173]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Our Findings:

Not an Ext4 regression: We verified that reverting "ext4: reduce stack
usage in ext4_mpage_readpages()" does not resolve the panic.

Suspected Fix: We suspect upstream commit 18e48d0e2c7b ("ovl: store
upper real file in ovl_file struct") is the correct fix. It seems to
address this exact lifetime race by persistently pinning the
underlying file.

The Problem: We cannot apply 18e48d0e2c7b to 6.12 stable because it
depends on the extensive ovl_real_file refactoring series (removing
ovl_real_fdget family functions) that landed in 6.13.

Is there a recommended way to backport the "persistent real file"
logic to 6.12 without pulling in the entire refactor chain?

Best,

Chenglong

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ