[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aUA7aqpEBWIN1lKQ@861G6M3>
Date: Mon, 15 Dec 2025 10:46:34 -0600
From: Chris Arges <carges@...udflare.com>
To: asmadeus@...ewreck.org
Cc: Christian Schoenebeck <linux_oss@...debyte.com>,
Christoph Hellwig <hch@...radead.org>,
Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>, v9fs@...ts.linux.dev,
linux-kernel@...r.kernel.org, David Howells <dhowells@...hat.com>,
Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH RFC v2] 9p/virtio: convert to extract_iter_to_sg()
On 2025-12-14 10:14:13, asmadeus@...ewreck.org wrote:
>
> (Chris, sorry, I had dropped you from Ccs as you were Cc'd from the
> patch itself and not the header... See [1] for start of thread if you're
> interested)
> [1] https://lore.kernel.org/r/20251214-virtio_trans_iter-v2-1-f7f7072e8c15@codewreck.org
>
>
No problem. Would it help if I tested this v2 patch? I still can easily
reproduce the problem.
> Christian Schoenebeck wrote on Sat, Dec 13, 2025 at 07:02:00PM +0100:
> > On Saturday, 13 December 2025 16:07:40 CET Dominique Martinet via B4 Relay wrote:
> > > This simplifies the code quite a bit and should fix issues with
> > > blowing up when iov_iter points at kmalloc data
> > >
> > > RFC - Not really tested yet!!
> >
> > TBH, that bothers me.
>
> Well, that just means "don't bother spending time testing much (or
> debugging if you do test and run into bugs)" and it likely won't be
> merged as is -- this is enough to start discussions without wasting more
> time if this gets refused.
>
> > Also considering the huge amount of changes; again, what
> > was actually wrong with the previously suggested simple patch v1 [1]? All I
> > can see is a discussion about the term "folio" being misused in the commit log
> > message, but nothing about the patch being wrong per se.
> > [1] https://lore.kernel.org/all/20251210-virtio_trans_iter-v1-1-92eee6d8b6db@codewreck.org/
>
> I agree we're close to a "perfect is the enemy of good" case here, but
> while it didn't show up in discussions I did bring it up in the patch
> comments themselves.
> My main problem is I'm pretty sure this will break any non-user non-kvec
> iov_iter; at the very least if we go that route we should guard the else
> with is_kvec(), figure out what type of iov Chris gets and handle that
> properly -- likely bvec? I still couldn't figure how to reproduce :/ --
> and error out cleanly in other cases.
>
If it helps I can work on a more isolated reproducer. Essentially I found
this when running some of our internal testing tools which spin up qemu VMs
and run kernel tests. I may be able to whittle that down to something
externally consumable.
> That's enough work that I figured switching boat wouldn't be much
> different, and if nothing else I've learned -a lot- about the kernel
> scatterlist, iov_iter and virtio APIs, so we can always say that time
> wasn't wasted even if this patch ends up dropped.
>
> The second problem that I'm reading between the lines of the replies is
> that iov_iter_get_pages_alloc2() is more or less broken/not supported
> for what we're trying to use it for, and the faster we stop using it the
> less bugs we'll get.
>
>
> (It's also really not such a huge patch, the bulk of the change is
> removed stuff we no longer use and massaging the cleanup path, but
> extract_iter_to_sg() is doing exactly what we were doing (lookup pages
> and sg_set_page() from the iov_iter) in better (for some reason when I
> added traces iov_iter_get_pages_alloc2() always stopped at one page for
> me?! I thought it was a cache thing but also happens with cache=none, I
> didn't spend time looking as this code will likely go away one way or
> another)
>
>
> > > This brings two major changes to how we've always done things with
> > > virtio 9p though:
> > > - We no longer fill in "chan->sg" with user data, but instead allocate a
> > > scatterlist; this should not be a problem nor a slowdown as previous
> > > code would allocate a page list instead, the main difference is that
> > > this might eventually lead to lifting the 512KB msize limit if
> > > compatible with virtio?
> >
> > Remember that I already had a patch set for lifting the msize limit [2], which
> > was *heavily* tested and will of course break with these changes BTW, and the
> > reason why I used a custom struct virtqueue_sg instead of the shared sg_table
> > API was that the latter could not be resized (see commit log of patch 3).
> >
> > [2] https://lore.kernel.org/all/cover.1657920926.git.linux_oss@crudebyte.com/
>
> Ugh, I'm sorry, it had completely slipped out of my mind...
> And it was waiting for me to debug the RDMA issues wasn't it :/
>
> FWIW I never heard back from former employer, and my lack of hardware
> situation hasn't changed, so we can probably mark RDMA as deprecated and
> see who complains and get them to look into it if they care...
> (I'm really sorry about having forgotten, but while I never have much
> time for 9p at any given moment if you don't hear back from me on some
> subject you want to push please do send a reminder after a month or
> so... It doesn't excuse my behavior and if we had any other maintainer
> active that might improve the situation, but at least it might prevent
> such useful patches from being forgotten while waiting on me)
>
> (... actually now I'm done re-reading the patches we've already applied
> patch 10/11 that could impact RDMA, and the rest is specific to virtio,
> so there's nothing else that could go wrong with it as far as this is
> concerned?...)
>
>
>
> OTOH I've learnt a lot about the scatterlist API meanwhile,
> and I don't necessarily agree about why you've had to basically
> reimplement sg_table just to chain them (what you described in
> patchset 3:
> > A custom struct virtqueue_sg is defined instead of using
> > shared API struct sg_table, because the latter would not
> > allow to resize the table after allocation. sg_append_table
> > API OTOH would not fit either, because it requires a list
> > of pages beforehand upon allocation. And both APIs only
> > support all-or-nothing allocation.
> )
> Having looked at sg_append_table I agree it doesn't look appropriate,
> but I think sg_table would work just fine -- you can call sg_chain on
> the last element of the table.
> It'll still require some of the helpers you introduced, but the
> virtqueue_sg type can go away, and we'd just need to loop over
> extract_iter_to_sg() for each of the sg_table "segment".
>
> If I'm not mistaken here, then the patches aren't that incompatible, and
> it's something worth doing in one order or the other.
>
>
>
> Anyway, what now?
> I'm happy to delay the switch to extract_iter_to_sg() to after your
> virtio msize rework if you want to send it again, but I don't think it's
> as simple as picking up the v1.
>
> I've honestly spent too long on this for this weekend already, so
> I'll log off for now but if you have any suggestion I'm all ears.
>
> Thanks,
> --
> Dominique Martinet | Asmadeus
Powered by blists - more mailing lists