[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOdxtTZv_B_pE1d1vgaE8+ar58y7pTiw0bL-djB1rhE-5wu2zQ@mail.gmail.com>
Date: Thu, 15 Jan 2026 21:56:06 -0800
From: Chenglong Tang <chenglongtang@...gle.com>
To: Amir Goldstein <amir73il@...il.com>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, Jan Kara <jack@...e.cz>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
miklos@...redi.hu
Subject: Re: [Regression 6.12] NULL pointer dereference in submit_bio_noacct
via backing_file_read_iter
[Follow Up] We have an important update regarding the
submit_bio_noacct panic we reported earlier.
To rule out the Integrity Measurement Architecture (IMA) as the root
cause, we disabled IMA verification in the workload configuration. The
kernel panic persisted with the exact same signature (RIP:
0010:submit_bio_noacct+0x21d), but the trigger path has changed.
New Stack Traces (Non-IMA) We are now observing the crash via two
standard filesystem paths.
Stack Trace:
Most failures are still similar:
I0115 20:30:23.535402 8496 vex_console.cc:116] (vex1): [
158.519909] BUG: kernel NULL pointer dereference, address:
0000000000000156
I0115 20:30:23.535483 8496 vex_console.cc:116] (vex1): [
158.542610] #PF: supervisor read access in kernel mode
I0115 20:30:23.585675 8496 vex_console.cc:116] (vex1): [
158.565011] #PF: error_code(0x0000) - not-present page
I0115 20:30:23.585702 8496 vex_console.cc:116] (vex1): [
158.583855] PGD 800000007c7da067 P4D 800000007c7da067 PUD 7c7db067 PMD
0
I0115 20:30:23.585709 8496 vex_console.cc:116] (vex1): [
158.590940] Oops: Oops: 0000 [#1] SMP PTI
I0115 20:30:23.636063 8496 vex_console.cc:116] (vex1): [
158.598950] CPU: 1 UID: 0 PID: 6717 Comm: agent_launcher Tainted: G
O 6.12.55+ #1
I0115 20:30:23.636092 8496 vex_console.cc:116] (vex1): [
158.629624] Tainted: [O]=OOT_MODULE
I0115 20:30:23.694223 8496 vex_console.cc:116] (vex1): [
158.639965] Hardware name: Google Google Compute Engine/Google Compute
Engine, BIOS Google 01/01/2011
I0115 20:30:23.694252 8496 vex_console.cc:116] (vex1): [
158.684210] RIP: 0010:submit_bio_noacct+0x21d/0x470
I0115 20:30:23.738566 8496 vex_console.cc:116] (vex1): [
158.705662] Code: 8b 73 48 4d 85 f6 74 55 4c 63 25 46 af 89 01 49 83
fc 06 0f 83 44 02 00 00 4f 8b a4 e6 d0 00 00 00 83 3d 99 ca 7d 01 00
7e 3f <43> 80 bc 3c 56 01 00 00 00 0f 84 28 01 00 00 48 89 df e8 fc 9f
02
I0115 20:30:23.738598 8496 vex_console.cc:116] (vex1): [
158.765443] RSP: 0000:ffffa74c84d53a98 EFLAGS: 00010202
I0115 20:30:23.793126 8496 vex_console.cc:116] (vex1): [
158.771022] RAX: ffffa319b3d6b4f0 RBX: ffffa319bdc9a3c0 RCX:
00000000005e1070
I0115 20:30:23.793158 8496 vex_console.cc:116] (vex1): [
158.778730] RDX: 0000000010300001 RSI: ffffa319b3d6b4f0 RDI:
ffffa319bdc9a3c0
I0115 20:30:23.843309 8496 vex_console.cc:116] (vex1): [
158.802189] RBP: ffffa74c84d53ac8 R08: 0000000000001000 R09:
ffffa319bdc9a3c0
I0115 20:30:23.843336 8496 vex_console.cc:116] (vex1): [
158.846780] R10: 0000000000000000 R11: 0000000069a1b000 R12:
0000000000000000
I0115 20:30:23.889620 8484 vex_dns.cc:145] Returning NODATA for DNS
Query: type=a, name=servicecontrol.googleapis.com.
I0115 20:30:23.898357 8496 vex_console.cc:116] (vex1): [
158.877737] R13: ffffa31941421f40 R14: ffffa31955419200 R15:
0000000000000000
I0115 20:30:23.948602 8496 vex_console.cc:116] (vex1): [
158.908715] FS: 00000000059efe28(0000) GS:ffffa319bdd00000(0000)
knlGS:0000000000000000
I0115 20:30:23.948640 8496 vex_console.cc:116] (vex1): [
158.937522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
I0115 20:30:23.948645 8496 vex_console.cc:116] (vex1): [
158.958522] CR2: 0000000000000156 CR3: 000000006a20a003 CR4:
00000000003726f0
I0115 20:30:23.948650 8496 vex_console.cc:116] (vex1): [
158.968648] Call Trace:
I0115 20:30:23.948655 8496 vex_console.cc:116] (vex1): [ 158.974419] <TASK>
I0115 20:30:23.948659 8496 vex_console.cc:116] (vex1): [
158.978222] ext4_mpage_readpages+0x75c/0x790
I0115 20:30:24.004540 8496 vex_console.cc:116] (vex1): [
158.983568] read_pages+0x9d/0x250
I0115 20:30:24.004568 8496 vex_console.cc:116] (vex1): [
158.987263] page_cache_ra_unbounded+0xa2/0x1c0
I0115 20:30:24.004573 8496 vex_console.cc:116] (vex1): [
158.992179] filemap_fault+0x218/0x660
I0115 20:30:24.004576 8496 vex_console.cc:116] (vex1): [
158.996311] __do_fault+0x4b/0x140
I0115 20:30:24.004580 8496 vex_console.cc:116] (vex1): [
159.000143] do_pte_missing+0x14f/0x1050
I0115 20:30:24.054563 8496 vex_console.cc:116] (vex1): [
159.018505] handle_mm_fault+0x886/0xb40
I0115 20:30:24.105692 8496 vex_console.cc:116] (vex1): [
159.063653] do_user_addr_fault+0x1eb/0x730
I0115 20:30:24.105721 8496 vex_console.cc:116] (vex1): [
159.094465] exc_page_fault+0x80/0x100
I0115 20:30:24.105726 8496 vex_console.cc:116] (vex1): [
159.116472] asm_exc_page_fault+0x26/0x30
Though there is a different one:
I0115 20:31:14.891091 7372 vex_console.cc:116] (vex1): [
163.902122] BUG: kernel NULL pointer dereference, address:
0000000000000157
I0115 20:31:14.950131 7372 vex_console.cc:116] (vex1): [
163.955031] #PF: supervisor read access in kernel mode
I0115 20:31:15.057629 7372 vex_console.cc:116] (vex1): [
163.986899] #PF: error_code(0x0000) - not-present page
I0115 20:31:15.057665 7372 vex_console.cc:116] (vex1): [
164.075132] PGD 0 P4D 0
I0115 20:31:15.057670 7372 vex_console.cc:116] (vex1): [
164.085940] Oops: Oops: 0000 [#1] SMP PTI
I0115 20:31:15.108501 7372 vex_console.cc:116] (vex1): [
164.090592] CPU: 0 UID: 0 PID: 399 Comm: jbd2/nvme0n1p1- Tainted: G
O 6.12.55+ #1
I0115 20:31:15.157731 7372 vex_console.cc:116] (vex1): [
164.146188] Tainted: [O]=OOT_MODULE
I0115 20:31:15.210631 7372 vex_console.cc:116] (vex1): [
164.172362] Hardware name: Google Google Compute Engine/Google Compute
Engine, BIOS Google 01/01/2011
I0115 20:31:15.266673 7372 vex_console.cc:116] (vex1): [
164.243113] RIP: 0010:submit_bio_noacct+0x21d/0x470
I0115 20:31:15.369886 7372 vex_console.cc:116] (vex1): [
164.276230] Code: 8b 73 48 4d 85 f6 74 55 4c 63 25 46 af 89 01 49 83
fc 06 0f 83 44 02 00 00 4f 8b a4 e6 d0 00 00 00 83 3d 99 ca 7d 01 00
7e 3f <43> 80 bc 3c 56 01 00 00 00 0f 84 28 01 00 00 48 89 df e8 fc 9f
02
I0115 20:31:15.369913 7372 vex_console.cc:116] (vex1): [
164.413258] RSP: 0000:ffffa674004ebc80 EFLAGS: 00010202
I0115 20:31:15.422131 7372 vex_console.cc:116] (vex1): [
164.420124] RAX: ffff9381c25d4790 RBX: ffff9381d0e5e540 RCX:
00000000000301c8
I0115 20:31:15.522750 7372 vex_console.cc:116] (vex1): [
164.464474] RDX: 0000000010300001 RSI: ffff9381c25d4790 RDI:
ffff9381d0e5e540
I0115 20:31:15.522784 7372 vex_console.cc:116] (vex1): [
164.542751] RBP: ffffa674004ebcb0 R08: 0000000000000000 R09:
0000000000000000
I0115 20:31:15.576921 7372 vex_console.cc:116] (vex1): [
164.578174] R10: 0000000000000000 R11: ffffffff8433e7a0 R12:
0000000000000000
I0115 20:31:15.577224 7372 vex_console.cc:116] (vex1): [
164.595801] R13: ffff9381c1425780 R14: ffff9381c196d400 R15:
0000000000000001
I0115 20:31:15.628049 7372 vex_console.cc:116] (vex1): [
164.626548] FS: 0000000000000000(0000) GS:ffff93823dc00000(0000)
knlGS:0000000000000000
I0115 20:31:15.732793 7372 vex_console.cc:116] (vex1): [
164.665104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
I0115 20:31:15.785564 7372 vex_console.cc:116] (vex1): [
164.757565] CR2: 0000000000000157 CR3: 000000007c678003 CR4:
00000000003726f0
I0115 20:31:15.843034 7372 vex_console.cc:116] (vex1): [
164.831021] Call Trace:
I0115 20:31:15.843065 7372 vex_console.cc:116] (vex1): [ 164.851014] <TASK>
I0115 20:31:15.900287 7372 vex_console.cc:116] (vex1): [
164.872000] jbd2_journal_commit_transaction+0x612/0x17e0
I0115 20:31:15.900315 7372 vex_console.cc:116] (vex1): [
164.914012] ? sched_clock+0xd/0x20
I0115 20:31:15.952673 7372 vex_console.cc:116] (vex1): [
164.963930] ? _raw_spin_unlock_irqrestore+0x12/0x30
I0115 20:31:16.004440 7372 vex_console.cc:116] (vex1): [
164.989978] ? __try_to_del_timer_sync+0x122/0x160
I0115 20:31:16.004471 7372 vex_console.cc:116] (vex1): [
165.029451] kjournald2+0xb1/0x220
I0115 20:31:16.004477 7372 vex_console.cc:116] (vex1): [
165.033558] ? __pfx_autoremove_wake_function+0x10/0x10
I0115 20:31:16.004481 7372 vex_console.cc:116] (vex1): [
165.044022] kthread+0x122/0x140
I0115 20:31:16.004486 7372 vex_console.cc:116] (vex1): [
165.048012] ? __pfx_kjournald2+0x10/0x10
I0115 20:31:16.004490 7372 vex_console.cc:116] (vex1): [
165.052944] ? __pfx_kthread+0x10/0x10
I0115 20:31:16.004494 7372 vex_console.cc:116] (vex1): [
165.057597] ret_from_fork+0x3f/0x50
I0115 20:31:16.057453 7372 vex_console.cc:116] (vex1): [
165.062127] ? __pfx_kthread+0x10/0x10
I0115 20:31:16.057484 7372 vex_console.cc:116] (vex1): [
165.079674] ret_from_fork_asm+0x1a/0x30
I0115 20:31:16.109674 7372 vex_console.cc:116] (vex1): [
165.113023] </TASK>
I0115 20:31:16.212548 7372 vex_console.cc:116] (vex1): [
165.131001] Modules linked in: nft_chain_nat xt_MASQUERADE nf_nat
xt_addrtype nft_compat nf_tables kvm_intel kvm irqbypass crc32c_intel
aesni_intel crypto_simd cryptd loadpin_trigger(O) fuse
I0115 20:31:16.262933 7372 vex_console.cc:116] (vex1): [
165.269971] CR2: 0000000000000157
I0115 20:31:16.316433 7372 vex_console.cc:116] (vex1): [
165.306980] ---[ end trace 0000000000000000 ]---
I0115 20:31:16.365756 7372 vex_console.cc:116] (vex1): [
165.361889] RIP: 0010:submit_bio_noacct+0x21d/0x470
I0115 20:31:16.518250 7372 vex_console.cc:116] (vex1): [
165.406957] Code: 8b 73 48 4d 85 f6 74 55 4c 63 25 46 af 89 01 49 83
fc 06 0f 83 44 02 00 00 4f 8b a4 e6 d0 00 00 00 83 3d 99 ca 7d 01 00
7e 3f <43> 80 bc 3c 56 01 00 00 00 0f 84 28 01 00 00 48 89 df e8 fc 9f
02
I0115 20:31:16.518278 7372 vex_console.cc:116] (vex1): [
165.558880] RSP: 0000:ffffa674004ebc80 EFLAGS: 00010202
I0115 20:31:16.568463 7372 vex_console.cc:116] (vex1): [
165.575239] RAX: ffff9381c25d4790 RBX: ffff9381d0e5e540 RCX:
00000000000301c8
I0115 20:31:16.568490 7372 vex_console.cc:116] (vex1): [
165.590012] RDX: 0000000010300001 RSI: ffff9381c25d4790 RDI:
ffff9381d0e5e540
I0115 20:31:16.568495 7372 vex_console.cc:116] (vex1): [
165.597793] RBP: ffffa674004ebcb0 R08: 0000000000000000 R09:
0000000000000000
I0115 20:31:16.568499 7372 vex_console.cc:116] (vex1): [
165.608408] R10: 0000000000000000 R11: ffffffff8433e7a0 R12:
0000000000000000
I0115 20:31:16.568502 7372 vex_console.cc:116] (vex1): [
165.616602] R13: ffff9381c1425780 R14: ffff9381c196d400 R15:
0000000000000001
I0115 20:31:16.618734 7372 vex_console.cc:116] (vex1): [
165.631823] FS: 0000000000000000(0000) GS:ffff93823dc00000(0000)
knlGS:0000000000000000
I0115 20:31:16.618770 7372 vex_console.cc:116] (vex1): [
165.653088] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
W0115 20:31:16.649110 7355 pvpanic.cc:136] Guest kernel has panicked!
I0115 20:31:16.671568 7372 vex_console.cc:116] (vex1): [
165.668488] CR2: 0000000000000157 CR3: 000000007c678003 CR4:
00000000003726f0
I0115 20:31:16.671599 7372 vex_console.cc:116] (vex1): [
165.686744] Kernel panic - not syncing: Fatal exception
This confirms the issue is not specific to IMA, but is a fundamental
race condition in the Block I/O layer or Ext4 subsystem under high
concurrency.
Since the crash occurs at the exact same instruction offset in
submit_bio_noacct regardless of the caller (IMA, Page Fault, or JBD2),
we suspect a bio or request_queue structure is being corrupted or
hitting a NULL pointer dereference in the underlying block device
driver (NVMe) or Device Mapper.
Best,
Chenglong
On Thu, Jan 15, 2026 at 6:56 PM Chenglong Tang <chenglongtang@...gle.com> wrote:
>
> Hi Amir,
>
> Thanks for the guidance. Using the specific order of the 8 commits
> (applying the ovl_real_fdget refactors before the fix consumers)
> resolved the boot-time NULL pointer panic. The system now boots
> successfully.
>
> However, we are still hitting the original kernel panic during runtime
> tests (specifically a CloudSQL workload).
>
> Current Commit Chain (Applied to 6.12):
>
> 76d83345a056 (HEAD -> main-R125-cos-6.12) ovl: convert
> ovl_real_fdget() callers to ovl_real_file()
> 740bdf920b15 ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
> 100b71ecb237 fs/backing_file: fix wrong argument in callback
> b877bca6858d ovl: store upper real file in ovl_file struct
> 595aac630596 ovl: allocate a container struct ovl_file for ovl private context
> 218ec543008d ovl: do not open non-data lower file for fsync
> 6def078942e2 ovl: use wrapper ovl_revert_creds()
> fe73aad71936 backing-file: clean up the API
>
> So it means none of these 8 commits were able to fix the problem. Let
> me explain what's going on here:
>
> We are reporting a rare but persistent kernel panic (~0.02% failure
> rate) occurring during container initialization on Linux 6.12.55+
> (x86_64). The 6.6.x is good. The panic is a NULL pointer dereference
> in submit_bio_noacct, triggered specifically when the Integrity
> Measurement Architecture (IMA) calculates a file hash during a runc
> create operation.
>
> We have isolated the crash to a specific container (ncsa) starting up
> during a high-concurrency boot sequence.
>
> Environment
> * Kernel: Linux 6.12.55+ (x86_64) / Container-Optimized OS
> * Workload: Cloud SQL instance initialization (heavy concurrent runc
> operations managed by systemd).
> * Filesystem: Ext4 backed by NVMe.
> * Security: AppArmor enabled, IMA (Integrity Measurement Architecture) active.
>
> The Failure Pattern(In every crash instance, the sequence is identical):
> * systemd initiates the startup of the ncsainit container.
> * runc executes the create command:
> `Bash
> `runc --root /var/lib/cloudsql/runc/root create --bundle
> /var/lib/cloudsql/runc/bundles/ncsa ...
>
> Immediately after this command is logged, the kernel panics.
>
> Stacktrace:
> [ 186.938290] BUG: kernel NULL pointer dereference, address: 0000000000000156
> [ 186.952203] #PF: supervisor read access in kernel mode
> [ 186.995248] Oops: Oops: 0000 [#1] SMP PTI
> [ 187.035946] CPU: 1 UID: 0 PID: 6764 Comm: runc:[2:INIT] Tainted: G
> O 6.12.55+ #1
> [ 187.081681] RIP: 0010:submit_bio_noacct+0x21d/0x470
> [ 187.412981] Call Trace:
> [ 187.415751] <TASK>
> [ 187.418141] ext4_mpage_readpages+0x75c/0x790
> [ 187.429011] read_pages+0x9d/0x250
> [ 187.450963] page_cache_ra_unbounded+0xa2/0x1c0
> [ 187.466083] filemap_get_pages+0x231/0x7a0
> [ 187.474687] filemap_read+0xf6/0x440
> [ 187.532345] integrity_kernel_read+0x34/0x60
> [ 187.560740] ima_calc_file_hash+0x1c1/0x9b0
> [ 187.608175] ima_collect_measurement+0x1b6/0x310
> [ 187.613102] process_measurement+0x4ea/0x850
> [ 187.617788] ima_bprm_check+0x5b/0xc0
> [ 187.635403] bprm_execve+0x203/0x560
> [ 187.645058] do_execveat_common+0x2fb/0x360
> [ 187.649730] __x64_sys_execve+0x3e/0x50
>
> Panic Analysis: The stack trace indicates a race condition where
> ima_bprm_check (triggered by executing the container binary) attempts
> to verify the file. This calls ima_calc_file_hash ->
> ext4_mpage_readpages, which submits a bio to the block layer.
>
> The crash occurs in submit_bio_noacct when it attempts to dereference
> a member of the bio structure (likely bio->bi_bdev or the request
> queue), suggesting the underlying device or queue structure is either
> uninitialized or has been torn down while the IMA check was still in
> flight.
>
> Context on Concurrency: This workload involves systemd starting
> multiple sidecar containers (logging, monitoring, coroner, etc.)
> simultaneously. We suspect this high-concurrency startup creates the
> IO/CPU contention required to hit this race window. However, the crash
> consistently happens only on the ncsa container, implying something
> specific about its launch configuration or timing makes it the
> reliable victim.
>
> Best,
>
> Chenglong
>
> On Wed, Jan 14, 2026 at 3:11 AM Amir Goldstein <amir73il@...il.com> wrote:
> >
> > On Wed, Jan 14, 2026 at 1:53 AM Chenglong Tang <chenglongtang@...gle.com> wrote:
> > >
> > > Hi OverlayFS Maintainers,
> > >
> > > This is from Container Optimized OS in Google Cloud.
> > >
> > > We are reporting a reproducible kernel panic on Kernel 6.12 involving
> > > a NULL pointer dereference in submit_bio_noacct.
> > >
> > > The Issue: The panic occurs intermittently (approx. 5 failures in 1000
> > > runs) during a specific PostgreSQL client test
> > > (postgres_client_test_postgres15_ctrdncsa) on Google
> > > Container-Optimized OS. The stack trace shows the crash happens when
> > > IMA (ima_calc_file_hash) attempts to read a file from OverlayFS via
> > > the new-in-6.12 backing_file_read_iter helper.
> > >
> > > It appears to be a race condition where the underlying block device is
> > > detached (becoming NULL) while the backing_file wrapper is still
> > > attempting to submit a read bio during container teardown.
> > >
> > > Stack Trace:
> > > [ OK ] Started 75.793015] BUG: kernel NULL pointer dereference,
> > > address: 0000000000000156
> > > [ 75.822539] #PF: supervisor read access in kernel mode
> > > [ 75.849332] #PF: error_code(0x0000) - not-present page
> > > [ 75.862775] PGD 7d012067 P4D 7d012067 PUD 7d013067 PMD 0
> > > [ 75.884283] Oops: Oops: 0000 [#1] SMP NOPTI
> > > [ 75.902274] CPU: 1 UID: 0 PID: 6476 Comm: helmd Tainted: G
> > > O 6.12.55+ #1
> > > [ 75.928903] Tainted: [O]=OOT_MODULE
> > > [ 75.942484] Hardware name: Google Google Compute Engine/Google
> > > Compute Engine, BIOS Google 01/01/2011
> > > [ 75.965868] RIP: 0010:submit_bio_noacct+0x21d/0x470
> > > [ 75.978340] Code: 8b 73 48 4d 85 f6 74 55 4c 63 25 b6 ad 89 01 49
> > > 83 fc 06 0f 83 44 02 00 00 4f 8b a4 e6 d0 00 00 00 83 3d 09 c9 7d 01
> > > 00 7e 3f <43> 80 bc 3c 56 01 00 00 00 0f 84 28 01 00 00 48 89 df e8 4c
> > > a0 02
> > > [ 76.035847] RSP: 0018:ffffa41183463880 EFLAGS: 00010202
> > > [ 76.050141] RAX: ffff9d4ec1a81a78 RBX: ffff9d4f3811e6c0 RCX: 00000000009410a0
> > > [ 76.065176] RDX: 0000000010300001 RSI: ffff9d4ec1a81a78 RDI: ffff9d4f3811e6c0
> > > [ 76.089292] RBP: ffffa411834638b0 R08: 0000000000001000 R09: ffff9d4f3811e6c0
> > > [ 76.110878] R10: 2000000000000000 R11: ffffffff8a33e700 R12: 0000000000000000
> > > [ 76.139068] R13: ffff9d4ec1422bc0 R14: ffff9d4ec2507000 R15: 0000000000000000
> > > [ 76.168391] FS: 0000000008df7f40(0000) GS:ffff9d4f3dd00000(0000)
> > > knlGS:0000000000000000
> > > [ 76.179024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 76.184951] CR2: 0000000000000156 CR3: 000000007d01c006 CR4: 0000000000370ef0
> > > [ 76.192352] Call Trace:
> > > [ 76.194981] <TASK>
> > > [ 76.197257] ext4_mpage_readpages+0x75c/0x790
> > > [ 76.201794] read_pages+0xa0/0x250
> > > [ 76.205373] page_cache_ra_unbounded+0xa2/0x1c0
> > > [ 76.232608] filemap_get_pages+0x16b/0x7a0
> > > [ 76.254151] ? srso_alias_return_thunk+0x5/0xfbef5
> > > [ 76.260523] filemap_read+0xf6/0x440
> > > [ 76.264540] do_iter_readv_writev+0x17e/0x1c0
> > > [ 76.275427] vfs_iter_read+0x8a/0x140
> > > [ 76.279272] backing_file_read_iter+0x155/0x250
> > > [ 76.284425] ovl_read_iter+0xd7/0x120
> > > [ 76.288270] ? __pfx_ovl_file_accessed+0x10/0x10
> > > [ 76.293069] vfs_read+0x2b1/0x300
> > > [ 76.296835] ksys_read+0x75/0xe0
> > > [ 76.300246] do_syscall_64+0x61/0x130
> > > [ 76.304173] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >
> > > Our Findings:
> > >
> > > Not an Ext4 regression: We verified that reverting "ext4: reduce stack
> > > usage in ext4_mpage_readpages()" does not resolve the panic.
> > >
> > > Suspected Fix: We suspect upstream commit 18e48d0e2c7b ("ovl: store
> > > upper real file in ovl_file struct") is the correct fix. It seems to
> > > address this exact lifetime race by persistently pinning the
> > > underlying file.
> >
> > That sounds odd.
> > Using a persistent upper real file may be more efficient than opening
> > a temporary file for every read, but the temporary file is a legit opened file,
> > so it looks like you would be averting the race rather than fixing it.
> >
> > Could you try to analyse the conditions that caused the race?
> >
> > >
> > > The Problem: We cannot apply 18e48d0e2c7b to 6.12 stable because it
> > > depends on the extensive ovl_real_file refactoring series (removing
> > > ovl_real_fdget family functions) that landed in 6.13.
> > >
> > > Is there a recommended way to backport the "persistent real file"
> > > logic to 6.12 without pulling in the entire refactor chain?
> > >
> >
> > These are the commits in overlayfs/file.c v6.12..v6.13:
> >
> > $ git log --oneline v6.12..v6.13 -- fs/overlayfs/file.c
> > d66907b51ba07 ovl: convert ovl_real_fdget() callers to ovl_real_file()
> > 4333e42ed4444 ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
> > 18e48d0e2c7b1 ovl: store upper real file in ovl_file struct
> > 87a8a76c34a2a ovl: allocate a container struct ovl_file for ovl private context
> > c2c54b5f34f63 ovl: do not open non-data lower file for fsync
> > fc5a1d2287bf2 ovl: use wrapper ovl_revert_creds()
> > 48b50624aec45 backing-file: clean up the API
> >
> > Your claim that 18e48d0e2c7b depends on ovl_real_fdget() is incorrect.
> > You may safely cherry-pick the 4 commits above leading to 18e48d0e2c7b1.
> > They are all self contained changes that would be good to have in 6.12.y,
> > because they would make cherry-picking future fixes easier.
> >
> > Specifically, backing-file: clean up the API, it is better to have the same
> > API in upstream and stable kernels.
> >
> > Thanks,
> > Amir.
Powered by blists - more mailing lists