lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTkwKbnXvUZs4UU9@infradead.org>
Date: Wed, 10 Dec 2025 00:32:41 -0800
From: Christoph Hellwig <hch@...radead.org>
To: asmadeus@...ewreck.org
Cc: Christoph Hellwig <hch@...radead.org>,
	Eric Van Hensbergen <ericvh@...nel.org>,
	Latchesar Ionkov <lucho@...kov.net>,
	Christian Schoenebeck <linux_oss@...debyte.com>,
	v9fs@...ts.linux.dev, linux-kernel@...r.kernel.org,
	David Howells <dhowells@...hat.com>,
	Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
	Chris Arges <carges@...udflare.com>
Subject: Re: [PATCH] 9p/virtio: restrict page pinning to user_backed_iter()
 iovec

On Wed, Dec 10, 2025 at 04:38:02PM +0900, asmadeus@...ewreck.org wrote:
> Christoph Hellwig wrote on Tue, Dec 09, 2025 at 10:04:30PM -0800:
> > On Wed, Dec 10, 2025 at 06:04:23AM +0900, Dominique Martinet via B4 Relay wrote:
> > > From: Dominique Martinet <asmadeus@...ewreck.org>
> > > 
> > > When doing a loop mount of a filesystem over 9p, read requests can come
> > > from unexpected places and blow up as reported by Chris Arges with this
> > > reproducer:
> > > ```
> > > dd if=/dev/zero of=./xfs.img bs=1M count=300
> > > yes | mkfs.xfs -b size=8192 ./xfs.img
> > > rm -rf ./mount && mkdir -p ./mount
> > > mount -o loop ./xfs.img ./mount
> > 
> > We should really wire this up to xfstests so that all file systems
> > see the pattern of kmalloc allocations passed into the block layer
> > and then on to the direct I/O code.
> 
> Note this doesn't seem to reproduce on my test VM so I'm not sure what
> kind of precondition there is to going through this code...

In general the best way to get XFS issue kmalloc I/O is to use a
file system sector size smaller than the page size, і.e. something
like:

   mkfs.xfs -s 512 -b 4096 -f

The above would do that job when run on a large page size system
like typical arm64 configs.

> > And 9p (just like NFS) really needs to switch away from
> > iov_iter_get_pages_alloc2 to iov_iter_extract_pages, which handles not
> > just this perfectly fine but also fixes various other issues.
> 
> Ok, so we can remove the special branch for kvec and just extract pages
> with this.

Yes.

> I understand it pins user spaces pages, so there's no risk of it moving
> under us during the IO, and there's nothing else we need to do about it?

Yes, unlike iov_iter_get_pages_alloc2 which gets a reference and doesn't
pin.

> Looking at the implementation for iov_iter_extract_bvec_pages() it looks
> like it might not process all the way to the end, so we need to loop on
> calling iov_iter_extract_pages()? (I see networking code looping on
> "while (iter->count > 0)")

Yes.

> I'll send a v2 with that when I can

You looked into this already, but in case you haven't seen it yet,
don't forget to call unpin_user_folio on the completion side as well.

> While I have your attention, there's some work to move away from large
> (>1MB) kmalloc() in the non-zerocopy case into kvmalloc() that might not
> be contiguous (see commit e21d451a82f3 ("9p: Use kvmalloc for message
> buffers on supported transports") that basically only did that for
> trans_fd), there's no iov_iter involved so it's off topic but how would
> one get around "extracting pages" out of that?

vmalloc and I/O is in general problematic, because you need special
calls to flush the cached for VIVT caches:
flush_kernel_vmap_range / invalidate_kernel_vmap_range.

If you want to stuff that into an iov_iter, the only sane way is to
call vmalloc_to_page and build a bvec iter from that at the moment.
You also need to do the flush_kernel_vmap_range /
invalidate_kernel_vmap_range in the caller for it.


> 
> > Note that the networking code still wants special treatment for kmalloc
> > pages, so you might have more work there.
> 
> I *think* we're fine on this end, as it's just passing the buffers into
> a sg list for virtio, as long as things don't move under the caller I
> assume they don't care...

Ok, if your backend is virtio that's fine.  If it's actual 9p over the
network you might still into issues, though.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ