netdev - Re: [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <76880ee8-d5ce-458d-b165-c11ce1a23c76@gmail.com>
Date: Wed, 5 Feb 2025 22:22:31 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org, virtualization@...ts.linux.dev,
 kvm@...r.kernel.org, linux-kselftest@...r.kernel.org,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, Donald Hunter <donald.hunter@...il.com>,
 Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>,
 David Ahern <dsahern@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>,
 Jason Wang <jasowang@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Eugenio Pérez <eperezma@...hat.com>,
 Stefan Hajnoczi <stefanha@...hat.com>,
 Stefano Garzarella <sgarzare@...hat.com>, Shuah Khan <shuah@...nel.org>,
 Kaiyuan Zhang <kaiyuanz@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
 Samiullah Khawaja <skhawaja@...gle.com>, Stanislav Fomichev
 <sdf@...ichev.me>, Joe Damato <jdamato@...tly.com>, dw@...idwei.uk
Subject: Re: [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path

On 2/5/25 22:16, Pavel Begunkov wrote:
> On 2/5/25 20:22, Mina Almasry wrote:
>> On Wed, Feb 5, 2025 at 4:41 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>>>
>>> On 1/28/25 14:49, Willem de Bruijn wrote:
>>>>>>> +struct net_devmem_dmabuf_binding *
>>>>>>> +net_devmem_get_sockc_binding(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>> +{
>>>>>>> +     struct net_devmem_dmabuf_binding *binding;
>>>>>>> +     int err = 0;
>>>>>>> +
>>>>>>> +     binding = net_devmem_lookup_dmabuf(sockc->dmabuf_id);
>>>>>>
>>>>>> This lookup is from global xarray net_devmem_dmabuf_bindings.
>>>>>>
>>>>>> Is there a check that the socket is sending out through the device
>>>>>> to which this dmabuf was bound with netlink? Should there be?
>>>>>> (e.g., SO_BINDTODEVICE).
>>>>>>
>>>>>
>>>>> Yes, I think it may be an issue if the user triggers a send from a
>>>>> different netdevice, because indeed when we bind a dmabuf we bind it
>>>>> to a specific netdevice.
>>>>>
>>>>> One option is as you say to require TX sockets to be bound and to
>>>>> check that we're bound to the correct netdev. I also wonder if I can
>>>>> make this work without SO_BINDTODEVICE, by querying the netdev the
>>>>> sock is currently trying to send out on and doing a check in the
>>>>> tcp_sendmsg. I'm not sure if this is possible but I'll give it a look.
>>>>
>>>> I was a bit quick on mentioning SO_BINDTODEVICE. Agreed that it is
>>>> vastly preferable to not require that, but infer the device from
>>>> the connected TCP sock.
>>>
>>> I wonder why so? I'd imagine something like SO_BINDTODEVICE is a
>>> better way to go. The user has to do it anyway, otherwise packets
>>> might go to a different device and the user would suddenly start
>>> getting errors with no good way to alleviate them (apart from
>>> likes of SO_BINDTODEVICE). It's even worse if it works for a while
>>> but starts to unpredictably fail as time passes. With binding at
>>> least it'd fail fast if the setup is not done correctly.
>>>
>>
>> I think there may be a misunderstanding. There is nothing preventing
>> the user from SO_BINDTODEVICE to make sure the socket is bound to the
> 
> Right, not arguing otherwise
> 
>> ifindex, and the test changes in the latest series actually do this
>> binding.
>>
>> It's just that on TX, we check what device we happen to be going out
>> over, and fail if we're going out of a different device.
>>
>> There are setups where the device will always be correct even without
>> SO_BINDTODEVICE. Like if the host has only 1 interface or if the
>> egress IP is only reachable over 1 interface. I don't see much reason
>> to require the user to SO_BINDTODEVICE in these cases.
> 
> That's exactly the problem. People would test their code with one setup
> where it works just fine, but then there will be a rare user of a
> library used by some other framework or a lonely server where it starts
> to fails for no apparent reason while "it worked before and nothing has
> changed". It's more predictable if enforced.
> 
> I don't think we'd care about setup overhead one extra ioctl() here(?),
> but with this option we'd need to be careful about not racing with
> rebinding, if it's allowed.

FWIW, it's surely not a big deal, but it makes a clearer api.
Hence my curiosity what are the other reasons.

-- 
Pavel Begunkov