lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <815008.1766145642@warthog.procyon.org.uk>
Date: Fri, 19 Dec 2025 12:00:42 +0000
From: David Howells <dhowells@...hat.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: dhowells@...hat.com, asmadeus@...ewreck.org,
    Eric Van Hensbergen <ericvh@...nel.org>,
    Latchesar Ionkov <lucho@...kov.net>,
    Christian Schoenebeck <linux_oss@...debyte.com>,
    v9fs@...ts.linux.dev, linux-kernel@...r.kernel.org,
    Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
    Chris Arges <carges@...udflare.com>
Subject: Re: [PATCH] 9p/virtio: restrict page pinning to user_backed_iter() iovec

Christoph Hellwig <hch@...radead.org> wrote:

> So right now except for netfs everything is on a kvec.  Dave, what
> kind of iov_iter does netfs send down to the file system?

It depends.  For buffered I/O it's a ITER_FOLIOQ, as you might expect for both
reading and writing.

For direct (and unbuffered) I/O, it's more complicated.

For direct writes, if the source iter is user-backed, it'll be extracted to an
ITER_BVEC by netfs_extract_user_iter().  This calls iov_iter_extract_pages()
to do the pinning - and netfs will release the pins later.  However, the
network layer, if MSG_SPLICE_PAGES is set will attempt to take refs on those
pages.

For direct writes, if the source iter is ITER_BVEC/KVEC/FOLIOQ (also XARRAY,
but I'm trying to get rid of that), netfs passes a slice of the source iter
down to the filesystem.  Netfs does not take refs on it as there's no
guarantee that the memory it points to has pages with refcounts.

Netfs will at some point soon hopefully acquire the ability to do bounce
buffering, but it's not there yet.

Direct reads work the same as direct writes.

> I had a bit of a hard time reading through it, but I'd expect that any page
> pinning would be done in netfs and not below it?

Page pinning is done by netfs_extract_user_iter() calling
iov_iter_extract_pages() - but only for user-backed iterators.  The network
layer needs a way to be told how to handle these correctly (which it doesn't
currently).

kernel-backed pages may not be pinned or ref'd.  They can be buffered instead,
but pinning and ref-taking is not permitted for, say, kmalloc'd buffers.
Instead, we need to use a callback from the network layer to indicate
completion - and the network layer needs to change to not ref the pages.

(Note "pinning" != "ref-taking" thanks to GUP terminology)

> Why are we using iov_iters here and not something like a bio_vec?

Because your idea of vmalloc'ing a big bio_vec[] and copying it repeatedly in
order to stick bits on the ends is not good.  I have a relatively simple
solution I'm working on - and mostly have working - but the act of allocating
and transcribing into a bio_vec[] incurs a noticeable performance penalty:-/.

This will hopefully allow me to phase out ITER_FOLIOQ, but I will still need a
folio list inside netfs, firstly because we may have to split a folio across
multiple RPC ops of different sizes to multiple devices in parallel and
secondly because as Willy's plans unfold, folio structs will no longer be
colocated with page structs and page structs will be dynamically allocated and
looked up in some sort of tree instead of a flat array - which means going
from physaddr in bio_vec[] to struct folio will suck when it comes to cleaning
up the page flags after I/O.

>  What is the fs / transport supported to do with these iters?

Pass them to sendmsg() or recvmsg().  That's what they currently do.

David


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ