linux-kernel - Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <64008E3D-7EEE-4E6C-9392-986C1D8E8858@barre.sh>
Date: Wed, 7 Jan 2026 01:27:50 +0100
From: Pierre Barre <pierre@...re.sh>
To: Christian Schoenebeck <linux_oss@...debyte.com>
Cc: ericvh@...nel.org,
 lucho@...kov.net,
 asmadeus <asmadeus@...ewreck.org>,
 David Howells <dhowells@...hat.com>,
 v9fs@...ts.linux.dev,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback
 caching



> On 5 Jan 2026, at 12:35, Christian Schoenebeck <linux_oss@...debyte.com> wrote:
> 
> On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
>> With writeback caching (cache=mmap), v9fs_stat2inode() and
>> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
>> response, even when dirty pages may exist locally. This causes processes
>> using lseek(SEEK_END) to see incorrect file sizes, leading to data
>> corruption when extending files while concurrent stat() calls occur.
>> 
>> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
>> enabled to preserve the client's authoritative i_size.
>> 
>> Signed-off-by: Pierre Barre <pierre@...re.sh>
>> ---
> 
> Adding David Howells to this discussion.
> 
> David, I read today that you suggested [1] to go for a wait approach instead?
> So somewhat similar to what Pierre suggested in v1 [2].

FWIW, I tested the initial approach I described and this seems to result in no real performance improvements compared to not using any writeback caching at all.

Best,
Pierre

> [1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
> [2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/
> 
> I understand the point about cifs and nfs, where you must expect foreign
> changes on server side by another client.
> 
> For 9pfs though IMO it would make sense to distinguish:
> 
> for cache=loose (and cache=fscache?) we are not expecting host side changes,
> so we could safely go for the approach suggested by this patch here and not
> wait for flushing dirty data and just use the locally cached file size (for
> performance reasons).
> 
> And for all other cache modes going for the cifs/nfs approach and wait for
> flushing? Does this make sense?
> 
> I just tested this reported issue with the following reproducer:
> 
> --------------------------
> 
> #!/bin/bash
> 
> rm -f foo.txt
> 
> for i in {1..50}
> do
>  echo $i >> foo.txt &
>  ls -lha foo.txt &
> done
> 
> sync
> 
> cat foo.txt
> 
> echo
> 
> wc -l foo.txt
> 
> --------------------------
> 
> For cache=loose I couldn't reproduce, but with cache=mmap I could reproduce
> data corruption, folio writeback hangup and the following error message:
> 
> [   60.847671] [append] R=134d: No submit
> 
> ...
> 
> [  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
> [  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
> [  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.178702] task:sync            state:D stack:0     pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002
> [  243.190379] Call Trace:
> [  243.192285]  <TASK>
> 
> [  243.193891]  __schedule (kernel/sched/core.c:5256 kernel/sched/core.c:6863)
> [  243.195999]  schedule (kernel/sched/core.c:6946 kernel/sched/core.c:6960)
> [  243.197730]  io_schedule (kernel/sched/core.c:7764 (discriminator 1) kernel/sched/core.c:7790 (discriminator 1))
> [  243.199418]  folio_wait_bit_common (mm/filemap.c:1312)
> [  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
> [  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/page-flags.h:597 mm/page-writeback.c:3087)
> [  243.205371]  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1))
> [  243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3))
> [  243.209305]  ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570 kernel/sched/core.c:4995 kernel/sched/core.c:5112)
> [  243.212030]  ? __schedule (kernel/sched/core.c:5259)
> [  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
> [  243.215925]  filemap_fdatawait_keep_errors (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 mm/filemap.c:363 mm/filemap.c:627)
> [  243.217628]  sync_inodes_sb (./include/linux/sched.h:2062 fs/fs-writeback.c:2777 fs/fs-writeback.c:2897)
> [  243.218914]  ? __pfx_sync_inodes_one_sb (fs/sync.c:75)
> [  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
> [  243.221601]  ksys_sync (fs/sync.c:103)
> [  243.222746]  __do_sys_sync (fs/sync.c:115)
> [  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
> [  243.226140]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
> [  243.228375] RIP: 0033:0x7f51e4613707
> [  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
> [  243.234092] RAX: ffffffffffffffda RBX: 00007ffdb2fccc38 RCX: 00007f51e4613707
> [  243.238153] RDX: 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557
> [  243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> [  243.244028] R10: 00007f51e45be0d0 R11: 0000000000000246 R12: 00005568f2d350fb
> [  243.247586] R13: 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00
> [  243.251622]  </TASK>
> 
> BTW running this script with cache=loose made me realize that Linux 9p client
> is always requesting server for every "ls foo.txt", which makes it extremely
> slow unnecessarily.
> 
>> fs/9p/vfs_inode.c      | 7 ++++++-
>> fs/9p/vfs_inode_dotl.c | 7 ++++++-
>> 2 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
>> index 97abe65bf7c1..bfba21f5d8a9 100644
>> --- a/fs/9p/vfs_inode.c
>> +++ b/fs/9p/vfs_inode.c
>> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct
>> path *path, if (IS_ERR(st))
>> return PTR_ERR(st);
>> 
>> - v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
>> + /*
>> + * With writeback caching, the client is authoritative for i_size.
>> + * Don't let the server overwrite it with a potentially stale value.
>> + */
>> + v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
>> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
>> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>> 
>> p9stat_free(st);
>> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
>> index 643e759eacb2..67a0ded2e223 100644
>> --- a/fs/9p/vfs_inode_dotl.c
>> +++ b/fs/9p/vfs_inode_dotl.c
>> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>> if (IS_ERR(st))
>> return PTR_ERR(st);
>> 
>> - v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
>> + /*
>> + * With writeback caching, the client is authoritative for i_size.
>> + * Don't let the server overwrite it with a potentially stale value.
>> + */
>> + v9fs_stat2inode_dotl(st, d_inode(dentry),
>> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
>> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>> /* Change block size to what the server returned */
>> stat->blksize = st->st_blksize;
> 
> 
>