[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1958077.1687474471@warthog.procyon.org.uk>
Date: Thu, 22 Jun 2023 23:54:31 +0100
From: David Howells <dhowells@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: dhowells@...hat.com, Eric Dumazet <edumazet@...gle.com>,
netdev@...r.kernel.org, Alexander Duyck <alexander.duyck@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>,
Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH net-next v3 01/18] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)
Jakub Kicinski <kuba@...nel.org> wrote:
> Maybe it's just me but I'd prefer to keep the clear rule that splice
> operates on pages not slab objects.
sendpage isn't only being used for splice(). Or were you referring to
splicing pages into socket buffers more generally?
> SIW is the software / fake implementation of RDMA, right? You couldn't have
> picked a less important user :(
ISCSI and sunrpc could both make use of this, as could ceph and others. I
have patches for sunrpc to make it condense into a single bio_vec[] and
sendmsg() in the server code (ie. nfsd) but for the moment, Chuck wanted me to
just do the xdr payload.
> > This offers the opportunity, at least in the future, to append slab data
> > to an already-existing private fragment in the skbuff.
>
> Maybe we can get Eric to comment. The ability to identify "frag type"
> seems cool indeed, but I haven't thought about using it to attach
> slab objects.
Unfortunately, you can't attach slab objects. Their lifetime isn't controlled
by put_page() or folio_put(). kmalloc()/kfree() doesn't refcount them -
they're recycled immediately. Hence why I was copying them. (Well, you
could attach, but then you need a callback mechanism).
What I'm trying to do is make it so that the process of calling sock_sendmsg()
with MSG_SPLICE_PAGES looks exactly the same as without: You fill in a
bio_vec[] pointing to your protocol header, the payload and the trailer,
pointing as appropriate to bits of slab, static, stack data or ref'able pages,
and call sendmsg and then the data will get copied or spliced as appropriate
to the page type, whether the MSG_SPLICE_PAGES flag is supplied and whether
the flag is supported.
There are a couple of things I'd like to avoid: (1) having to call
sock_sendmsg() more than once per message and (2) having sendmsg allocate more
space and make a copy of data that you had to copy into a frag before calling
sendmsg.
David
Powered by blists - more mailing lists