[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxjMSs7c0OQvexFA11r37=VzCHMjpPm+1EFteYWdJGw2Ug@mail.gmail.com>
Date: Fri, 16 Jan 2026 11:30:28 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Chenglong Tang <chenglongtang@...gle.com>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, Jan Kara <jack@...e.cz>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
miklos@...redi.hu, overlayfs <linux-unionfs@...r.kernel.org>
Subject: Re: [Regression 6.12] NULL pointer dereference in submit_bio_noacct
via backing_file_read_iter
On Fri, Jan 16, 2026 at 3:56 AM Chenglong Tang <chenglongtang@...gle.com> wrote:
>
> Hi Amir,
>
> Thanks for the guidance. Using the specific order of the 8 commits
> (applying the ovl_real_fdget refactors before the fix consumers)
> resolved the boot-time NULL pointer panic. The system now boots
> successfully.
>
> However, we are still hitting the original kernel panic during runtime
> tests (specifically a CloudSQL workload).
>
> Current Commit Chain (Applied to 6.12):
>
> 76d83345a056 (HEAD -> main-R125-cos-6.12) ovl: convert
> ovl_real_fdget() callers to ovl_real_file()
> 740bdf920b15 ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
> 100b71ecb237 fs/backing_file: fix wrong argument in callback
> b877bca6858d ovl: store upper real file in ovl_file struct
> 595aac630596 ovl: allocate a container struct ovl_file for ovl private context
> 218ec543008d ovl: do not open non-data lower file for fsync
> 6def078942e2 ovl: use wrapper ovl_revert_creds()
> fe73aad71936 backing-file: clean up the API
>
> So it means none of these 8 commits were able to fix the problem.
That's actually a good thing, because as I said from the start,
it does not look like storing the upper real file in ovl_file should have
fixed the root cause.
> Let me explain what's going on here:
>
> We are reporting a rare but persistent kernel panic (~0.02% failure
> rate) occurring during container initialization on Linux 6.12.55+
> (x86_64). The 6.6.x is good. The panic is a NULL pointer dereference
> in submit_bio_noacct, triggered specifically when the Integrity
> Measurement Architecture (IMA) calculates a file hash during a runc
> create operation.
>
> We have isolated the crash to a specific container (ncsa) starting up
> during a high-concurrency boot sequence.
>
> Environment
> * Kernel: Linux 6.12.55+ (x86_64) / Container-Optimized OS
> * Workload: Cloud SQL instance initialization (heavy concurrent runc
> operations managed by systemd).
> * Filesystem: Ext4 backed by NVMe.
> * Security: AppArmor enabled, IMA (Integrity Measurement Architecture) active.
>
> The Failure Pattern(In every crash instance, the sequence is identical):
> * systemd initiates the startup of the ncsainit container.
> * runc executes the create command:
> `Bash
> `runc --root /var/lib/cloudsql/runc/root create --bundle
> /var/lib/cloudsql/runc/bundles/ncsa ...
>
> Immediately after this command is logged, the kernel panics.
>
> Stacktrace:
> [ 186.938290] BUG: kernel NULL pointer dereference, address: 0000000000000156
> [ 186.952203] #PF: supervisor read access in kernel mode
> [ 186.995248] Oops: Oops: 0000 [#1] SMP PTI
> [ 187.035946] CPU: 1 UID: 0 PID: 6764 Comm: runc:[2:INIT] Tainted: G
> O 6.12.55+ #1
> [ 187.081681] RIP: 0010:submit_bio_noacct+0x21d/0x470
> [ 187.412981] Call Trace:
> [ 187.415751] <TASK>
> [ 187.418141] ext4_mpage_readpages+0x75c/0x790
> [ 187.429011] read_pages+0x9d/0x250
> [ 187.450963] page_cache_ra_unbounded+0xa2/0x1c0
> [ 187.466083] filemap_get_pages+0x231/0x7a0
> [ 187.474687] filemap_read+0xf6/0x440
> [ 187.532345] integrity_kernel_read+0x34/0x60
> [ 187.560740] ima_calc_file_hash+0x1c1/0x9b0
> [ 187.608175] ima_collect_measurement+0x1b6/0x310
> [ 187.613102] process_measurement+0x4ea/0x850
> [ 187.617788] ima_bprm_check+0x5b/0xc0
> [ 187.635403] bprm_execve+0x203/0x560
> [ 187.645058] do_execveat_common+0x2fb/0x360
> [ 187.649730] __x64_sys_execve+0x3e/0x50
>
> Panic Analysis: The stack trace indicates a race condition where
> ima_bprm_check (triggered by executing the container binary) attempts
> to verify the file. This calls ima_calc_file_hash ->
> ext4_mpage_readpages, which submits a bio to the block layer.
>
> The crash occurs in submit_bio_noacct when it attempts to dereference
> a member of the bio structure (likely bio->bi_bdev or the request
> queue), suggesting the underlying device or queue structure is either
> uninitialized or has been torn down while the IMA check was still in
> flight.
>
> Context on Concurrency: This workload involves systemd starting
> multiple sidecar containers (logging, monitoring, coroner, etc.)
> simultaneously. We suspect this high-concurrency startup creates the
> IO/CPU contention required to hit this race window. However, the crash
> consistently happens only on the ncsa container, implying something
> specific about its launch configuration or timing makes it the
> reliable victim.
>
Your followup email said that the same race can happen also without IMA.
I wonder if it could happen without a backing file, but that is hard
to find out.
My first thought is that it could be related to some black magic
with the backing vm_file, but I have nothing smarter to suggest at
this point.
Thanks,
Amir.
Powered by blists - more mailing lists