[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230622191134.54d5cb0b@kernel.org>
Date: Thu, 22 Jun 2023 19:11:34 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: David Howells <dhowells@...hat.com>
Cc: Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org, Alexander
Duyck <alexander.duyck@...il.com>, "David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>, Willem de Bruijn
<willemdebruijn.kernel@...il.com>, David Ahern <dsahern@...nel.org>,
Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, Menglong Dong
<imagedong@...cent.com>
Subject: Re: [PATCH net-next v3 01/18] net: Copy slab data for
sendmsg(MSG_SPLICE_PAGES)
On Thu, 22 Jun 2023 23:54:31 +0100 David Howells wrote:
> > Maybe it's just me but I'd prefer to keep the clear rule that splice
> > operates on pages not slab objects.
>
> sendpage isn't only being used for splice(). Or were you referring to
> splicing pages into socket buffers more generally?
Yes, sorry, any sort of "zero-copy attachment of data onto a socket
queue".
> > SIW is the software / fake implementation of RDMA, right? You couldn't have
> > picked a less important user :(
>
> ISCSI and sunrpc could both make use of this, as could ceph and others. I
> have patches for sunrpc to make it condense into a single bio_vec[] and
> sendmsg() in the server code (ie. nfsd) but for the moment, Chuck wanted me to
> just do the xdr payload.
But to be clear (and I'm not implying that it's not a strong enough
reason) - the only benefit from letting someone pass headers in a slab
object is that the code already uses kmalloc(), right? IOW it could be
changed to use frags without much of a LoC bloat?
> > Maybe we can get Eric to comment. The ability to identify "frag type"
> > seems cool indeed, but I haven't thought about using it to attach
> > slab objects.
>
> Unfortunately, you can't attach slab objects. Their lifetime isn't controlled
> by put_page() or folio_put(). kmalloc()/kfree() doesn't refcount them -
> they're recycled immediately. Hence why I was copying them. (Well, you
> could attach, but then you need a callback mechanism).
Right, right, I thought you were saying that _in the future_ we may try
to attach the slab objects as frags (and presumably copy when someone
tries to ref them). Maybe I over-interpreted.
> What I'm trying to do is make it so that the process of calling sock_sendmsg()
> with MSG_SPLICE_PAGES looks exactly the same as without: You fill in a
> bio_vec[] pointing to your protocol header, the payload and the trailer,
> pointing as appropriate to bits of slab, static, stack data or ref'able pages,
> and call sendmsg and then the data will get copied or spliced as appropriate
> to the page type, whether the MSG_SPLICE_PAGES flag is supplied and whether
> the flag is supported.
>
> There are a couple of things I'd like to avoid: (1) having to call
> sock_sendmsg() more than once per message and (2) having sendmsg allocate more
> space and make a copy of data that you had to copy into a frag before calling
> sendmsg.
If we're not planning to attach the slab objects as frags, then surely
doing kmalloc() + free() in the caller, and then allocating a frag and
copying the data over in the skb / socket code is also inefficient.
Fixing the caller gives all the benefits you want, and then some.
Granted some form of alloc_skb_frag() needs to be added so that callers
don't curse us, I'd start with something based on sk_page_frag().
Or we could pull the coping out into an intermediate helper which
first replaces all slab objects in the iovec with page frags and then
calls sock_sendmsg()? Maybe that's stupid...
Let's hear what others think. If we can't reach instant agreement --
can you strategically separate out the minimal set of changes required
to just kill MSG_SENDPAGE_NOTLAST. IMHO it's worth getting that into
6.5.
Powered by blists - more mailing lists