[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250810175712.3588005-1-arnout@bzzt.net>
Date: Sun, 10 Aug 2025 19:57:11 +0200
From: Arnout Engelen <arnout@...t.net>
To: ryan@...fa.xyz
Cc: antony.antony@...unet.com,
antony@...nome.org,
asmadeus@...ewreck.org,
brauner@...nel.org,
dhowells@...hat.com,
ericvh@...nel.org,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux_oss@...debyte.com,
lucho@...kov.net,
maximilian@...sch.me,
netfs@...ts.linux.dev,
regressions@...ts.linux.dev,
sedat.dilek@...il.com,
v9fs@...ts.linux.dev
Subject: Re: [REGRESSION] 9pfs issues on 6.12-rc1
On Fri, 13 Jun 2025 00:24:13 +0200, Ryan Lahfa wrote:
> Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit :
> > On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote:
> > > Hi Antony,
> > >
> > > I think the attached should fix it properly rather than working around it as
> > > the previous patch did. If you could give it a whirl?
> >
> > Yes this also fix the crash.
> >
> > Tested-by: Antony Antony <antony.antony@...unet.com>
>
> I cannot confirm this fixes the crash for me. My reproducer is slightly
> more complicated than Max's original one, albeit, still on NixOS and
> probably uses 9p more intensively than the automated NixOS testings
> workload.
I'm seeing a problem in the same area - the symptom is slightly different,
but the location seems very similar. I'm also running a NixOS image.
Mounting a 9p filesystem in qemu with `cache=readahead`, reading a
12943-byte file, in the guest I do see a 12943-byte file, but only
the first 12288 bytes are populated: the rest are zero. This also
reproduces (most but not all of the time) on 6.16-rc7, but not on all host
machines I've tried.
After applying a simplified version of [1] (i.e. [2]), the problem does not
reproduce anymore. It seems something in `p9_client_read_once` somehow
leaves the iov_iter in an unhealthy state. It would be good to understand
exactly what, but I haven't been able to figure that out yet.
I have a smallish nix-based reproducer at [3], and a more involved setup
with a lot of logging enabled and a convenient way to attach gdb at [4].
You start the VM and then 'cat /repro/default.json' manually, and see if
it looks 'truncated'.
Interestingly, the file is read in two p9 read calls: one of 12288 bytes and
one of 655 bytes. The first read is a zero-copy one, the second is not
zero-copy (because it is smaller than 1024). I've also tried with a slightly
larger version of the file, that is read as 2 zero-copy reads, and I have not
been able to reproduce the problem with that. From my (admittedly limited)
understanding the non-zerocopy code path looks fine, though.
I hope this is helpful - I'd be happy to keep looking into this further,
but any help pointing me in the right direction would be much appreciated :)
Kind regards,
Arnout
[1] https://lore.kernel.org/all/3327438.1729678025@warthog.procyon.org.uk/T/#mc97a248b0f673dff6dc8613b508ca4fd45c4fefe
[2] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging/kernel-use-copied-iov_iter.patch
[3] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/small-reproducer
[4] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging
Powered by blists - more mailing lists