lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aT4PZUtvirfq3Lov@codewreck.org>
Date: Sun, 14 Dec 2025 10:14:13 +0900
From: asmadeus@...ewreck.org
To: Christian Schoenebeck <linux_oss@...debyte.com>
Cc: Christoph Hellwig <hch@...radead.org>,
	Eric Van Hensbergen <ericvh@...nel.org>,
	Latchesar Ionkov <lucho@...kov.net>, v9fs@...ts.linux.dev,
	linux-kernel@...r.kernel.org, David Howells <dhowells@...hat.com>,
	Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
	Chris Arges <carges@...udflare.com>
Subject: Re: [PATCH RFC v2] 9p/virtio: convert to extract_iter_to_sg()


(Chris, sorry, I had dropped you from Ccs as you were Cc'd from the
patch itself and not the header... See [1] for start of thread if you're
interested)
[1] https://lore.kernel.org/r/20251214-virtio_trans_iter-v2-1-f7f7072e8c15@codewreck.org


Christian Schoenebeck wrote on Sat, Dec 13, 2025 at 07:02:00PM +0100:
> On Saturday, 13 December 2025 16:07:40 CET Dominique Martinet via B4 Relay wrote:
> > This simplifies the code quite a bit and should fix issues with
> > blowing up when iov_iter points at kmalloc data
> > 
> > RFC - Not really tested yet!!
> 
> TBH, that bothers me.

Well, that just means "don't bother spending time testing much (or
debugging if you do test and run into bugs)" and it likely won't be
merged as is -- this is enough to start discussions without wasting more
time if this gets refused.

> Also considering the huge amount of changes; again, what
> was actually wrong with the previously suggested simple patch v1 [1]? All I
> can see is a discussion about the term "folio" being misused in the commit log
> message, but nothing about the patch being wrong per se.
> [1] https://lore.kernel.org/all/20251210-virtio_trans_iter-v1-1-92eee6d8b6db@codewreck.org/

I agree we're close to a "perfect is the enemy of good" case here, but
while it didn't show up in discussions I did bring it up in the patch
comments themselves.
My main problem is I'm pretty sure this will break any non-user non-kvec
iov_iter; at the very least if we go that route we should guard the else
with is_kvec(), figure out what type of iov Chris gets and handle that
properly -- likely bvec? I still couldn't figure how to reproduce :/ --
and error out cleanly in other cases.

That's enough work that I figured switching boat wouldn't be much
different, and if nothing else I've learned -a lot- about the kernel
scatterlist, iov_iter and virtio APIs, so we can always say that time
wasn't wasted even if this patch ends up dropped.

The second problem that I'm reading between the lines of the replies is
that iov_iter_get_pages_alloc2() is more or less broken/not supported
for what we're trying to use it for, and the faster we stop using it the
less bugs we'll get.


(It's also really not such a huge patch, the bulk of the change is
removed stuff we no longer use and massaging the cleanup path, but
extract_iter_to_sg() is doing exactly what we were doing (lookup pages
and sg_set_page() from the iov_iter) in better (for some reason when I
added traces iov_iter_get_pages_alloc2() always stopped at one page for
me?! I thought it was a cache thing but also happens with cache=none, I
didn't spend time looking as this code will likely go away one way or
another)


> > This brings two major changes to how we've always done things with
> > virtio 9p though:
> > - We no longer fill in "chan->sg" with user data, but instead allocate a
> >   scatterlist; this should not be a problem nor a slowdown as previous
> >   code would allocate a page list instead, the main difference is that
> >   this might eventually lead to lifting the 512KB msize limit if
> >   compatible with virtio?
> 
> Remember that I already had a patch set for lifting the msize limit [2], which
> was *heavily* tested and will of course break with these changes BTW, and the
> reason why I used a custom struct virtqueue_sg instead of the shared sg_table
> API was that the latter could not be resized (see commit log of patch 3).
> 
> [2] https://lore.kernel.org/all/cover.1657920926.git.linux_oss@crudebyte.com/

Ugh, I'm sorry, it had completely slipped out of my mind...
And it was waiting for me to debug the RDMA issues wasn't it :/

FWIW I never heard back from former employer, and my lack of hardware
situation hasn't changed, so we can probably mark RDMA as deprecated and
see who complains and get them to look into it if they care...
(I'm really sorry about having forgotten, but while I never have much
time for 9p at any given moment if you don't hear back from me on some
subject you want to push please do send a reminder after a month or
so... It doesn't excuse my behavior and if we had any other maintainer
active that might improve the situation, but at least it might prevent
such useful patches from being forgotten while waiting on me)

(... actually now I'm done re-reading the patches we've already applied
patch 10/11 that could impact RDMA, and the rest is specific to virtio,
so there's nothing else that could go wrong with it as far as this is
concerned?...)



OTOH I've learnt a lot about the scatterlist API meanwhile,
and I don't necessarily agree about why you've had to basically
reimplement sg_table just to chain them (what you described in
patchset 3:
> A custom struct virtqueue_sg is defined instead of using
> shared API struct sg_table, because the latter would not
> allow to resize the table after allocation. sg_append_table
> API OTOH would not fit either, because it requires a list
> of pages beforehand upon allocation. And both APIs only
> support all-or-nothing allocation.
)
Having looked at sg_append_table I agree it doesn't look appropriate,
but I think sg_table would work just fine -- you can call sg_chain on
the last element of the table.
It'll still require some of the helpers you introduced, but the
virtqueue_sg type can go away, and we'd just need to loop over
extract_iter_to_sg() for each of the sg_table "segment".

If I'm not mistaken here, then the patches aren't that incompatible, and
it's something worth doing in one order or the other.



Anyway, what now?
I'm happy to delay the switch to extract_iter_to_sg() to after your
virtio msize rework if you want to send it again, but I don't think it's
as simple as picking up the v1.

I've honestly spent too long on this for this weekend already, so
I'll log off for now but if you have any suggestion I'm all ears.

Thanks,
-- 
Dominique Martinet | Asmadeus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ