[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230622132835.3c4e38ea@kernel.org>
Date: Thu, 22 Jun 2023 13:28:35 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: David Howells <dhowells@...hat.com>, Eric Dumazet <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, Alexander Duyck <alexander.duyck@...il.com>,
"David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>, David Ahern
<dsahern@...nel.org>, Matthew Wilcox <willy@...radead.org>, Jens Axboe
<axboe@...nel.dk>, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH net-next v3 01/18] net: Copy slab data for
sendmsg(MSG_SPLICE_PAGES)
On Thu, 22 Jun 2023 20:40:43 +0100 David Howells wrote:
> > How did that happen? I thought MSG_SPLICE_PAGES comes from former
> > sendpage users and sendpage can't operate on slab pages.
>
> Some of my patches, take the siw one for example, now aggregate all the bits
> that make up a message into a single sendmsg() call, including any protocol
> header and trailer in the same bio_vec[] as the payload where before it would
> have to do, say, sendmsg+sendpage+sendpage+...+sendpage+sendmsg.
Maybe it's just me but I'd prefer to keep the clear rule that splice
operates on pages not slab objects. SIW is the software / fake
implementation of RDMA, right? You couldn't have picked a less
important user :(
Paolo indicated that he'll take a look tomorrow, we'll see what he
thinks.
> I'm trying to make it so that I make the minimum number of sendmsg calls
> (ie. 1 where possible) and the loop that processes the data is inside of that.
The in-kernel users can be fixed to not use slab, and user space can't
feed us slab objects.
> This offers the opportunity, at least in the future, to append slab data to an
> already-existing private fragment in the skbuff.
Maybe we can get Eric to comment. The ability to identify "frag type"
seems cool indeed, but I haven't thought about using it to attach
slab objects.
> > The locking is to local_bh_disable(). Does the milliont^w new frag
> > allocator have any additional benefits?
>
> It is shareable because it does locking. Multiple sockets of multiple
> protocols can share the pages it has reserved. It drops the lock around calls
> to the page allocator so that GFP_KERNEL/GFP_NOFS can be used with it.
>
> Without this, the page fragment allocator would need to be per-socket, I
> think, or be done further up the stack where the higher level drivers would
> have to have a fragment bucket per whatever unit they use to deal with the
> lack of locking.
There's also the per task frag which can be used under normal conditions
(sk_use_task_frag).
> Doing it here makes cleanup simpler since I just transfer my ref on the
> fragment to the skbuff frag list and it will automatically be cleaned up with
> the skbuff.
>
> Willy suggested that I just allocate a page for each thing I want to copy, but
> I would rather not do that for, say, an 8-byte bit of protocol data.
TBH my intuition would also be get a full page and let the callers who
care about performance fix themselves. Assuming we want to let slab
objects in in the first place.
Powered by blists - more mailing lists