linux-kernel - Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1946294.tdWV9SEqCh@weasel>
Date: Wed, 04 Feb 2026 12:48:55 +0100
From: Christian Schoenebeck <linux_oss@...debyte.com>
To: Pierre Barre <pierre@...re.sh>, asmadeus <asmadeus@...ewreck.org>
Cc: ericvh@...nel.org, lucho@...kov.net, David Howells <dhowells@...hat.com>,
 v9fs@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject:
 Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching

On Wednesday, 7 January 2026 01:27:50 CET Pierre Barre wrote:
> > On 5 Jan 2026, at 12:35, Christian Schoenebeck <linux_oss@...debyte.com>
> > wrote:> 
> > On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
> >> With writeback caching (cache=mmap), v9fs_stat2inode() and
> >> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
> >> response, even when dirty pages may exist locally. This causes processes
> >> using lseek(SEEK_END) to see incorrect file sizes, leading to data
> >> corruption when extending files while concurrent stat() calls occur.
> >> 
> >> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
> >> enabled to preserve the client's authoritative i_size.
> >> 
> >> Signed-off-by: Pierre Barre <pierre@...re.sh>
> >> ---
> > 
> > Adding David Howells to this discussion.
> > 
> > David, I read today that you suggested [1] to go for a wait approach
> > instead? So somewhat similar to what Pierre suggested in v1 [2].
> 
> FWIW, I tested the initial approach I described and this seems to result in
> no real performance improvements compared to not using any writeback
> caching at all.

Right, surprisingly I don't measure a performance penalty either.

Dominique, suggestions how to proceed on this issue?

/Christian

> > [1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
> > [2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/
> > 
> > I understand the point about cifs and nfs, where you must expect foreign
> > changes on server side by another client.
> > 
> > For 9pfs though IMO it would make sense to distinguish:
> > 
> > for cache=loose (and cache=fscache?) we are not expecting host side
> > changes, so we could safely go for the approach suggested by this patch
> > here and not wait for flushing dirty data and just use the locally cached
> > file size (for performance reasons).
> > 
> > And for all other cache modes going for the cifs/nfs approach and wait for
> > flushing? Does this make sense?
> > 
> > I just tested this reported issue with the following reproducer:
> > 
> > --------------------------
> > 
> > #!/bin/bash
> > 
> > rm -f foo.txt
> > 
> > for i in {1..50}
> > do
> > 
> >  echo $i >> foo.txt &
> >  ls -lha foo.txt &
> > 
> > done
> > 
> > sync
> > 
> > cat foo.txt
> > 
> > echo
> > 
> > wc -l foo.txt
> > 
> > --------------------------
> > 
> > For cache=loose I couldn't reproduce, but with cache=mmap I could
> > reproduce
> > data corruption, folio writeback hangup and the following error message:
> > 
> > [   60.847671] [append] R=134d: No submit
> > 
> > ...
> > 
> > [  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
> > [  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
> > [  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message. [  243.178702] task:sync            state:D stack:0    
> > pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002 [ 
> > 243.190379] Call Trace:
> > [  243.192285]  <TASK>
> > 
> > [  243.193891]  __schedule (kernel/sched/core.c:5256
> > kernel/sched/core.c:6863) [  243.195999]  schedule
> > (kernel/sched/core.c:6946 kernel/sched/core.c:6960) [  243.197730] 
> > io_schedule (kernel/sched/core.c:7764 (discriminator 1)
> > kernel/sched/core.c:7790 (discriminator 1)) [  243.199418] 
> > folio_wait_bit_common (mm/filemap.c:1312)
> > [  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
> > [  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202
> > ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > ./include/linux/page-flags.h:597 mm/page-writeback.c:3087) [  243.205371]
> >  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1)) [ 
> > 243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143
> > (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) [ 
> > 243.209305]  ? finish_task_switch.isra.0
> > (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570
> > kernel/sched/core.c:4995 kernel/sched/core.c:5112) [  243.212030]  ?
> > __schedule (kernel/sched/core.c:5259)
> > [  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
> > [  243.215925]  filemap_fdatawait_keep_errors
> > (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > mm/filemap.c:363 mm/filemap.c:627) [  243.217628]  sync_inodes_sb
> > (./include/linux/sched.h:2062 fs/fs-writeback.c:2777
> > fs/fs-writeback.c:2897) [  243.218914]  ? __pfx_sync_inodes_one_sb
> > (fs/sync.c:75)
> > [  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
> > [  243.221601]  ksys_sync (fs/sync.c:103)
> > [  243.222746]  __do_sys_sync (fs/sync.c:115)
> > [  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63
> > (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 
> > 243.226140]  entry_SYSCALL_64_after_hwframe
> > (arch/x86/entry/entry_64.S:131) [  243.228375] RIP: 0033:0x7f51e4613707
> > [  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000a2 [  243.234092] RAX: ffffffffffffffda RBX:
> > 00007ffdb2fccc38 RCX: 00007f51e4613707 [  243.238153] RDX:
> > 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557 [ 
> > 243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000000000000 [  243.244028] R10: 00007f51e45be0d0 R11:
> > 0000000000000246 R12: 00005568f2d350fb [  243.247586] R13:
> > 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00 [ 
> > 243.251622]  </TASK>
> > 
> > BTW running this script with cache=loose made me realize that Linux 9p
> > client is always requesting server for every "ls foo.txt", which makes it
> > extremely slow unnecessarily.
> > 
> >> fs/9p/vfs_inode.c      | 7 ++++++-
> >> fs/9p/vfs_inode_dotl.c | 7 ++++++-
> >> 2 files changed, 12 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> >> index 97abe65bf7c1..bfba21f5d8a9 100644
> >> --- a/fs/9p/vfs_inode.c
> >> +++ b/fs/9p/vfs_inode.c
> >> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const
> >> struct
> >> path *path, if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> 
> >> p9stat_free(st);
> >> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> >> index 643e759eacb2..67a0ded2e223 100644
> >> --- a/fs/9p/vfs_inode_dotl.c
> >> +++ b/fs/9p/vfs_inode_dotl.c
> >> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
> >> if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode_dotl(st, d_inode(dentry),
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> /* Change block size to what the server returned */
> >> stat->blksize = st->st_blksize;