[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTkwKbnXvUZs4UU9@infradead.org>
Date: Wed, 10 Dec 2025 00:32:41 -0800
From: Christoph Hellwig <hch@...radead.org>
To: asmadeus@...ewreck.org
Cc: Christoph Hellwig <hch@...radead.org>,
Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>,
Christian Schoenebeck <linux_oss@...debyte.com>,
v9fs@...ts.linux.dev, linux-kernel@...r.kernel.org,
David Howells <dhowells@...hat.com>,
Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
Chris Arges <carges@...udflare.com>
Subject: Re: [PATCH] 9p/virtio: restrict page pinning to user_backed_iter()
iovec
On Wed, Dec 10, 2025 at 04:38:02PM +0900, asmadeus@...ewreck.org wrote:
> Christoph Hellwig wrote on Tue, Dec 09, 2025 at 10:04:30PM -0800:
> > On Wed, Dec 10, 2025 at 06:04:23AM +0900, Dominique Martinet via B4 Relay wrote:
> > > From: Dominique Martinet <asmadeus@...ewreck.org>
> > >
> > > When doing a loop mount of a filesystem over 9p, read requests can come
> > > from unexpected places and blow up as reported by Chris Arges with this
> > > reproducer:
> > > ```
> > > dd if=/dev/zero of=./xfs.img bs=1M count=300
> > > yes | mkfs.xfs -b size=8192 ./xfs.img
> > > rm -rf ./mount && mkdir -p ./mount
> > > mount -o loop ./xfs.img ./mount
> >
> > We should really wire this up to xfstests so that all file systems
> > see the pattern of kmalloc allocations passed into the block layer
> > and then on to the direct I/O code.
>
> Note this doesn't seem to reproduce on my test VM so I'm not sure what
> kind of precondition there is to going through this code...
In general the best way to get XFS issue kmalloc I/O is to use a
file system sector size smaller than the page size, і.e. something
like:
mkfs.xfs -s 512 -b 4096 -f
The above would do that job when run on a large page size system
like typical arm64 configs.
> > And 9p (just like NFS) really needs to switch away from
> > iov_iter_get_pages_alloc2 to iov_iter_extract_pages, which handles not
> > just this perfectly fine but also fixes various other issues.
>
> Ok, so we can remove the special branch for kvec and just extract pages
> with this.
Yes.
> I understand it pins user spaces pages, so there's no risk of it moving
> under us during the IO, and there's nothing else we need to do about it?
Yes, unlike iov_iter_get_pages_alloc2 which gets a reference and doesn't
pin.
> Looking at the implementation for iov_iter_extract_bvec_pages() it looks
> like it might not process all the way to the end, so we need to loop on
> calling iov_iter_extract_pages()? (I see networking code looping on
> "while (iter->count > 0)")
Yes.
> I'll send a v2 with that when I can
You looked into this already, but in case you haven't seen it yet,
don't forget to call unpin_user_folio on the completion side as well.
> While I have your attention, there's some work to move away from large
> (>1MB) kmalloc() in the non-zerocopy case into kvmalloc() that might not
> be contiguous (see commit e21d451a82f3 ("9p: Use kvmalloc for message
> buffers on supported transports") that basically only did that for
> trans_fd), there's no iov_iter involved so it's off topic but how would
> one get around "extracting pages" out of that?
vmalloc and I/O is in general problematic, because you need special
calls to flush the cached for VIVT caches:
flush_kernel_vmap_range / invalidate_kernel_vmap_range.
If you want to stuff that into an iov_iter, the only sane way is to
call vmalloc_to_page and build a bvec iter from that at the moment.
You also need to do the flush_kernel_vmap_range /
invalidate_kernel_vmap_range in the caller for it.
>
> > Note that the networking code still wants special treatment for kmalloc
> > pages, so you might have more work there.
>
> I *think* we're fine on this end, as it's just passing the buffers into
> a sg list for virtio, as long as things don't move under the caller I
> assume they don't care...
Ok, if your backend is virtio that's fine. If it's actual 9p over the
network you might still into issues, though.
Powered by blists - more mailing lists