lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e38090b0-f990-4347-b4bf-6f33f36bb851@gmail.com>
Date: Thu, 20 Feb 2025 14:35:20 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org, kvm@...r.kernel.org,
 virtualization@...ts.linux.dev, linux-kselftest@...r.kernel.org,
 Donald Hunter <donald.hunter@...il.com>, Jakub Kicinski <kuba@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>,
 Neal Cardwell <ncardwell@...gle.com>, David Ahern <dsahern@...nel.org>,
 "Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
 Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Eugenio Pérez
 <eperezma@...hat.com>, Stefan Hajnoczi <stefanha@...hat.com>,
 Stefano Garzarella <sgarzare@...hat.com>, Shuah Khan <shuah@...nel.org>,
 sdf@...ichev.me, dw@...idwei.uk, Jamal Hadi Salim <jhs@...atatu.com>,
 Victor Nogueira <victor@...atatu.com>, Pedro Tammela
 <pctammela@...atatu.com>, Samiullah Khawaja <skhawaja@...gle.com>,
 Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [PATCH net-next v3 5/6] net: devmem: Implement TX path

On 2/20/25 01:46, Mina Almasry wrote:
> On Wed, Feb 19, 2025 at 2:40 PM Pavel Begunkov <asml.silence@...il.com> wrote:
>>
>> On 2/17/25 23:26, Mina Almasry wrote:
>>> On Thu, Feb 13, 2025 at 5:17 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>> ...
>>>>>>> It's asserting that sizeof(ubuf_info_msgzc) <= sizeof(skb->cb), and
>>>>>>> I'm guessing increasing skb->cb size is not really the way to go.
>>>>>>>
>>>>>>> What I may be able to do here is stash the binding somewhere in
>>>>>>> ubuf_info_msgzc via union with fields we don't need for devmem, and/or
>>>>>>
>>>>>> It doesn't need to account the memory against the user, and you
>>>>>> actually don't want that because dmabuf should take care of that.
>>>>>> So, it should be fine to reuse ->mmp.
>>>>>>
>>>>>> It's also not a real sk_buff, so maybe maintainers wouldn't mind
>>>>>> reusing some more space out of it, if that would even be needed.
>>>>>>
>>>>>
>>>>> netmem skb are real sk_buff, with the modification that frags are not
>>>>
>>>> We were discussing ubuf_info allocation, take a look at
>>>> msg_zerocopy_alloc(), it has nothing to do with netmems and all that.
>>>>
>>>
>>> Yes. My response was regarding the suggestion that we can use space in
>>> devmem skbs however we want though.
>>
>> Well, at least I didn't suggest that, assuming "devmem skbs" are skbs
>> filled with devmem frags. I think the confusion here is thinking
>> that skb->cb you mentioned above is about "devmem skbs", while it's
>> special skbs without data used only to piggy back ubuf allocation.
> 
> Ah, I see. I still don't see how we can just increase the size of
> skb->cb when it's shared between these special skbs and regular skbs.

The approach was not to increase ->cb but rather reuse some other unused
in the path sk_buff fields. Though, looking at __msg_zerocopy_callback(),
maybe it's better not to entertain that, as the skb is queued into the
error queue. But again, not like you need it.

>> Functionally speaking, it'd be perfectly fine to get rid of the
>> warning and allocate it with kmalloc().
>>
> 
> More suggestions to refactor unrelated things to force through a
> msg->sg_from_iter approach.

Mina, you're surprising me. Neither I suggested to do that, just
trying to help with your confusion using analogies, nor I said that
it'd be welcome, nor it's somehow "unrelated". And "forcing"
is a misstatement, so far I've been extending a recommendation
on how to make it better.

...
>> If you've already done, maybe you can post it as a draft? At least
>> it'll be obvious why you say it's more complicated.
>>
> 
> I don't have anything worth sharing. Just went down this rabbit hole
> and saw a bunch of MSG_ZEROCOPY checks (!msg->msg_ubuf checks around
> MSG_ZEROCOPY code) and restrictions (skb->cb size) need to be
> addressed and checks to be added. From this thread you seem to be
> suggesting more changes to force in a msg->sg_from_iter approach
> adding to the complications.

To sum up, you haven't tried it.

>>> The complication could be worth it if there was some upside, but I
>>> don't see one tbh. Passing the binding down to
>>> zerocopy_fill_skb_from_devmem seems like a better approach to my eye
>>> so far
>>
>> The upside is that 1) you currently you add overhead to common
>> path (incl copy),
> 
> You mean the unlikely() check for devmem before delegating to
> skb_zerocopy_fill_from_devmem? Should be minimal.

Like keeping the binding in tcp_sendmsg_locked(). The point is,
as you mentioned overhead ("adds more checks to the fast
MSG_ZEROCOPY paths"), all things included the current approach
will be adding more of it to MSG_ZEROCOPY and also other users.

>> 2) passing it down through all the function also
>> have overhead to the zerocopy and MSG_ZEROCOPY path, which I'd
>> assume is comparable to those extra checks you have.
> 
> Complicating/refactoring existing code for devmem TCP to force in a
> msg->sg_from_iter and save 1 arg passed down a couple of functions
> doesn't seem like a good tradeoff IMO.
> 
>> 3) tcp would
>> need to know about devmem tcp and its bindings, while it all could
>> be in one spot under the MSG_ZEROCOPY check.
> 
> I don't see why this is binding to tcp somehow. If anything it makes

I don't get what you're saying, but it refers to devmem binding,
which you add to TCP path, and so tcp now has to know how to work
with devmem instead of all of it being hidden behind the curtains
of ubuf_info. And it sticks out not only for tcp, but for all
zerocopy users by the virtue of dragging it down through all
helpers.

> the devmem TX implementation follow closely MSG_ZEROCOPY, and existing

Following closely would be passing ubuf just like MSG_ZEROCOPY does,
and not plumbing devmem all the way through all helpers.

> MSG_ZEROCOPY code would be easily extended for devmem TX without
> having to also carry refactors to migrate to msg->sg_from_iter

Don't be afraid of refactoring when it makes things better. We're
talking about minor changes touching only bits in the direct
vicinity of your set.

> approach (just grab the binding and pass it to
> skb_zerocopy_iter_stream).
> 
>> 4) When you'd want
>> another protocol to support that, instead of a simple
>>
>> ubuf = get_devmem_ubuf();
>>
>> You'd need to plumb binding passing through the stack there as
>> well.
>>
> 
> Similar to above, I think this approach will actually extend easier to
> any protocol already using MSG_ZEROCOPY, because we follow that
> closely instead of requiring refactors to force msg->sg_from_iter
> approach.

-- 
Pavel Begunkov


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ