lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 22 Jun 2023 19:11:34 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: David Howells <dhowells@...hat.com>
Cc: Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org, Alexander
 Duyck <alexander.duyck@...il.com>, "David S. Miller" <davem@...emloft.net>,
 Paolo Abeni <pabeni@...hat.com>, Willem de Bruijn
 <willemdebruijn.kernel@...il.com>, David Ahern <dsahern@...nel.org>,
 Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, Menglong Dong
 <imagedong@...cent.com>
Subject: Re: [PATCH net-next v3 01/18] net: Copy slab data for
 sendmsg(MSG_SPLICE_PAGES)

On Thu, 22 Jun 2023 23:54:31 +0100 David Howells wrote:
> > Maybe it's just me but I'd prefer to keep the clear rule that splice
> > operates on pages not slab objects.  
> 
> sendpage isn't only being used for splice().  Or were you referring to
> splicing pages into socket buffers more generally?

Yes, sorry, any sort of "zero-copy attachment of data onto a socket
queue".

> > SIW is the software / fake implementation of RDMA, right? You couldn't have
> > picked a less important user :(  
> 
> ISCSI and sunrpc could both make use of this, as could ceph and others.  I
> have patches for sunrpc to make it condense into a single bio_vec[] and
> sendmsg() in the server code (ie. nfsd) but for the moment, Chuck wanted me to
> just do the xdr payload.

But to be clear (and I'm not implying that it's not a strong enough
reason) - the only benefit from letting someone pass headers in a slab
object is that the code already uses kmalloc(), right? IOW it could be
changed to use frags without much of a LoC bloat?

> > Maybe we can get Eric to comment. The ability to identify "frag type"
> > seems cool indeed, but I haven't thought about using it to attach
> > slab objects.  
> 
> Unfortunately, you can't attach slab objects.  Their lifetime isn't controlled
> by put_page() or folio_put().  kmalloc()/kfree() doesn't refcount them -
> they're recycled immediately.  Hence why I was copying them.  (Well, you
> could attach, but then you need a callback mechanism).

Right, right, I thought you were saying that _in the future_ we may try
to attach the slab objects as frags (and presumably copy when someone
tries to ref them). Maybe I over-interpreted.

> What I'm trying to do is make it so that the process of calling sock_sendmsg()
> with MSG_SPLICE_PAGES looks exactly the same as without: You fill in a
> bio_vec[] pointing to your protocol header, the payload and the trailer,
> pointing as appropriate to bits of slab, static, stack data or ref'able pages,
> and call sendmsg and then the data will get copied or spliced as appropriate
> to the page type, whether the MSG_SPLICE_PAGES flag is supplied and whether
> the flag is supported.
> 
> There are a couple of things I'd like to avoid: (1) having to call
> sock_sendmsg() more than once per message and (2) having sendmsg allocate more
> space and make a copy of data that you had to copy into a frag before calling
> sendmsg.

If we're not planning to attach the slab objects as frags, then surely
doing kmalloc() + free() in the caller, and then allocating a frag and
copying the data over in the skb / socket code is also inefficient.
Fixing the caller gives all the benefits you want, and then some.

Granted some form of alloc_skb_frag() needs to be added so that callers
don't curse us, I'd start with something based on sk_page_frag().

Or we could pull the coping out into an intermediate helper which
first replaces all slab objects in the iovec with page frags and then
calls sock_sendmsg()? Maybe that's stupid...

Let's hear what others think. If we can't reach instant agreement --
can you strategically separate out the minimal set of changes required
to just kill MSG_SENDPAGE_NOTLAST. IMHO it's worth getting that into
6.5.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ