linux-kernel - Re: [REGRESSION] 9pfs issues on 6.12-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250810175712.3588005-1-arnout@bzzt.net>
Date: Sun, 10 Aug 2025 19:57:11 +0200
From: Arnout Engelen <arnout@...t.net>
To: ryan@...fa.xyz
Cc: antony.antony@...unet.com,
	antony@...nome.org,
	asmadeus@...ewreck.org,
	brauner@...nel.org,
	dhowells@...hat.com,
	ericvh@...nel.org,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux_oss@...debyte.com,
	lucho@...kov.net,
	maximilian@...sch.me,
	netfs@...ts.linux.dev,
	regressions@...ts.linux.dev,
	sedat.dilek@...il.com,
	v9fs@...ts.linux.dev
Subject: Re: [REGRESSION] 9pfs issues on 6.12-rc1

On Fri, 13 Jun 2025 00:24:13 +0200, Ryan Lahfa wrote:
> Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit :
> > On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote:
> > > Hi Antony,
> > > 
> > > I think the attached should fix it properly rather than working around it as
> > > the previous patch did.  If you could give it a whirl?
> > 
> > Yes this also fix the crash.
> > 
> > Tested-by: Antony Antony <antony.antony@...unet.com>
> 
> I cannot confirm this fixes the crash for me. My reproducer is slightly
> more complicated than Max's original one, albeit, still on NixOS and
> probably uses 9p more intensively than the automated NixOS testings
> workload.

I'm seeing a problem in the same area - the symptom is slightly different,
but the location seems very similar. I'm also running a NixOS image.
Mounting a 9p filesystem in qemu with `cache=readahead`, reading a
12943-byte file, in the guest I do see a 12943-byte file, but only
the first 12288 bytes are populated: the rest are zero. This also
reproduces (most but not all of the time) on 6.16-rc7, but not on all host
machines I've tried.

After applying a simplified version of [1] (i.e. [2]), the problem does not
reproduce anymore. It seems something in `p9_client_read_once` somehow
leaves the iov_iter in an unhealthy state. It would be good to understand
exactly what, but I haven't been able to figure that out yet.

I have a smallish nix-based reproducer at [3], and a more involved setup
with a lot of logging enabled and a convenient way to attach gdb at [4].
You start the VM and then 'cat /repro/default.json' manually, and see if
it looks 'truncated'.

Interestingly, the file is read in two p9 read calls: one of 12288 bytes and
one of 655 bytes. The first read is a zero-copy one, the second is not
zero-copy (because it is smaller than 1024). I've also tried with a slightly
larger version of the file, that is read as 2 zero-copy reads, and I have not
been able to reproduce the problem with that. From my (admittedly limited)
understanding the non-zerocopy code path looks fine, though.

I hope this is helpful - I'd be happy to keep looking into this further,
but any help pointing me in the right direction would be much appreciated :)

Kind regards,

Arnout

[1] https://lore.kernel.org/all/3327438.1729678025@warthog.procyon.org.uk/T/#mc97a248b0f673dff6dc8613b508ca4fd45c4fefe
[2] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging/kernel-use-copied-iov_iter.patch
[3] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/small-reproducer
[4] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging