netdev - Re: [PATCH net-next v3 01/18] net: Copy slab data for sendmsg(MSG_SPLICE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230622132835.3c4e38ea@kernel.org>
Date: Thu, 22 Jun 2023 13:28:35 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: David Howells <dhowells@...hat.com>, Eric Dumazet <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, Alexander Duyck <alexander.duyck@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>, David Ahern
 <dsahern@...nel.org>, Matthew Wilcox <willy@...radead.org>, Jens Axboe
 <axboe@...nel.dk>, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH net-next v3 01/18] net: Copy slab data for
 sendmsg(MSG_SPLICE_PAGES)

On Thu, 22 Jun 2023 20:40:43 +0100 David Howells wrote:
> > How did that happen? I thought MSG_SPLICE_PAGES comes from former
> > sendpage users and sendpage can't operate on slab pages.  
> 
> Some of my patches, take the siw one for example, now aggregate all the bits
> that make up a message into a single sendmsg() call, including any protocol
> header and trailer in the same bio_vec[] as the payload where before it would
> have to do, say, sendmsg+sendpage+sendpage+...+sendpage+sendmsg.

Maybe it's just me but I'd prefer to keep the clear rule that splice
operates on pages not slab objects. SIW is the software / fake
implementation of RDMA, right? You couldn't have picked a less
important user :(

Paolo indicated that he'll take a look tomorrow, we'll see what he
thinks.

> I'm trying to make it so that I make the minimum number of sendmsg calls
> (ie. 1 where possible) and the loop that processes the data is inside of that.

The in-kernel users can be fixed to not use slab, and user space can't
feed us slab objects.

> This offers the opportunity, at least in the future, to append slab data to an
> already-existing private fragment in the skbuff.

Maybe we can get Eric to comment. The ability to identify "frag type"
seems cool indeed, but I haven't thought about using it to attach
slab objects.

> > The locking is to local_bh_disable(). Does the milliont^w new frag
> > allocator have any additional benefits?  
> 
> It is shareable because it does locking.  Multiple sockets of multiple
> protocols can share the pages it has reserved.  It drops the lock around calls
> to the page allocator so that GFP_KERNEL/GFP_NOFS can be used with it.
> 
> Without this, the page fragment allocator would need to be per-socket, I
> think, or be done further up the stack where the higher level drivers would
> have to have a fragment bucket per whatever unit they use to deal with the
> lack of locking.

There's also the per task frag which can be used under normal conditions
(sk_use_task_frag).

> Doing it here makes cleanup simpler since I just transfer my ref on the
> fragment to the skbuff frag list and it will automatically be cleaned up with
> the skbuff.
> 
> Willy suggested that I just allocate a page for each thing I want to copy, but
> I would rather not do that for, say, an 8-byte bit of protocol data.

TBH my intuition would also be get a full page and let the callers who
care about performance fix themselves. Assuming we want to let slab
objects in in the first place.