[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTkjWsOyDzXq_bLv@codewreck.org>
Date: Wed, 10 Dec 2025 16:38:02 +0900
From: asmadeus@...ewreck.org
To: Christoph Hellwig <hch@...radead.org>
Cc: Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>,
Christian Schoenebeck <linux_oss@...debyte.com>,
v9fs@...ts.linux.dev, linux-kernel@...r.kernel.org,
David Howells <dhowells@...hat.com>,
Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
Chris Arges <carges@...udflare.com>
Subject: Re: [PATCH] 9p/virtio: restrict page pinning to user_backed_iter()
iovec
Christoph Hellwig wrote on Tue, Dec 09, 2025 at 10:04:30PM -0800:
> On Wed, Dec 10, 2025 at 06:04:23AM +0900, Dominique Martinet via B4 Relay wrote:
> > From: Dominique Martinet <asmadeus@...ewreck.org>
> >
> > When doing a loop mount of a filesystem over 9p, read requests can come
> > from unexpected places and blow up as reported by Chris Arges with this
> > reproducer:
> > ```
> > dd if=/dev/zero of=./xfs.img bs=1M count=300
> > yes | mkfs.xfs -b size=8192 ./xfs.img
> > rm -rf ./mount && mkdir -p ./mount
> > mount -o loop ./xfs.img ./mount
>
> We should really wire this up to xfstests so that all file systems
> see the pattern of kmalloc allocations passed into the block layer
> and then on to the direct I/O code.
Note this doesn't seem to reproduce on my test VM so I'm not sure what
kind of precondition there is to going through this code...
> > The problem is that iov_iter_get_pages_alloc2() apparently cannot be
> > called on folios (as illustrated by the backtrace below), so limit what
> > iov we can pin from !iov_iter_is_kvec() to user_backed_iter()
>
> As willy pointed out this is a kmalloc.
Ok I got confused because of the VM_BUG_ON_FOLIO(), but looking back
it's in a folio_get() called directly from __iov_iter_get_pages_alloc()
so that was likely a bvec...
My points of "but there's a case for it in __iov_iter_get_pages_alloc()"
and "we have no idea what to do" still stand though, but you answered
that below:
> And 9p (just like NFS) really needs to switch away from
> iov_iter_get_pages_alloc2 to iov_iter_extract_pages, which handles not
> just this perfectly fine but also fixes various other issues.
Ok, so we can remove the special branch for kvec and just extract pages
with this.
I understand it pins user spaces pages, so there's no risk of it moving
under us during the IO, and there's nothing else we need to do about it?
Looking at the implementation for iov_iter_extract_bvec_pages() it looks
like it might not process all the way to the end, so we need to loop on
calling iov_iter_extract_pages()? (I see networking code looping on
"while (iter->count > 0)")
I'll send a v2 with that when I can
While I have your attention, there's some work to move away from large
(>1MB) kmalloc() in the non-zerocopy case into kvmalloc() that might not
be contiguous (see commit e21d451a82f3 ("9p: Use kvmalloc for message
buffers on supported transports") that basically only did that for
trans_fd), there's no iov_iter involved so it's off topic but how would
one get around "extracting pages" out of that?
> Note that the networking code still wants special treatment for kmalloc
> pages, so you might have more work there.
I *think* we're fine on this end, as it's just passing the buffers into
a sg list for virtio, as long as things don't move under the caller I
assume they don't care...
Thanks,
--
Dominique Martinet | Asmadeus
Powered by blists - more mailing lists