[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88cb8f03-7976-4846-a74d-e2d234c5cf8d@gmail.com>
Date: Wed, 5 Feb 2025 22:16:18 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, virtualization@...ts.linux.dev,
kvm@...r.kernel.org, linux-kselftest@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Donald Hunter <donald.hunter@...il.com>,
Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>,
David Ahern <dsahern@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Eugenio Pérez <eperezma@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>, Shuah Khan <shuah@...nel.org>,
Kaiyuan Zhang <kaiyuanz@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Samiullah Khawaja <skhawaja@...gle.com>, Stanislav Fomichev
<sdf@...ichev.me>, Joe Damato <jdamato@...tly.com>, dw@...idwei.uk
Subject: Re: [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path
On 2/5/25 20:22, Mina Almasry wrote:
> On Wed, Feb 5, 2025 at 4:41 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>>
>> On 1/28/25 14:49, Willem de Bruijn wrote:
>>>>>> +struct net_devmem_dmabuf_binding *
>>>>>> +net_devmem_get_sockc_binding(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>> +{
>>>>>> + struct net_devmem_dmabuf_binding *binding;
>>>>>> + int err = 0;
>>>>>> +
>>>>>> + binding = net_devmem_lookup_dmabuf(sockc->dmabuf_id);
>>>>>
>>>>> This lookup is from global xarray net_devmem_dmabuf_bindings.
>>>>>
>>>>> Is there a check that the socket is sending out through the device
>>>>> to which this dmabuf was bound with netlink? Should there be?
>>>>> (e.g., SO_BINDTODEVICE).
>>>>>
>>>>
>>>> Yes, I think it may be an issue if the user triggers a send from a
>>>> different netdevice, because indeed when we bind a dmabuf we bind it
>>>> to a specific netdevice.
>>>>
>>>> One option is as you say to require TX sockets to be bound and to
>>>> check that we're bound to the correct netdev. I also wonder if I can
>>>> make this work without SO_BINDTODEVICE, by querying the netdev the
>>>> sock is currently trying to send out on and doing a check in the
>>>> tcp_sendmsg. I'm not sure if this is possible but I'll give it a look.
>>>
>>> I was a bit quick on mentioning SO_BINDTODEVICE. Agreed that it is
>>> vastly preferable to not require that, but infer the device from
>>> the connected TCP sock.
>>
>> I wonder why so? I'd imagine something like SO_BINDTODEVICE is a
>> better way to go. The user has to do it anyway, otherwise packets
>> might go to a different device and the user would suddenly start
>> getting errors with no good way to alleviate them (apart from
>> likes of SO_BINDTODEVICE). It's even worse if it works for a while
>> but starts to unpredictably fail as time passes. With binding at
>> least it'd fail fast if the setup is not done correctly.
>>
>
> I think there may be a misunderstanding. There is nothing preventing
> the user from SO_BINDTODEVICE to make sure the socket is bound to the
Right, not arguing otherwise
> ifindex, and the test changes in the latest series actually do this
> binding.
>
> It's just that on TX, we check what device we happen to be going out
> over, and fail if we're going out of a different device.
>
> There are setups where the device will always be correct even without
> SO_BINDTODEVICE. Like if the host has only 1 interface or if the
> egress IP is only reachable over 1 interface. I don't see much reason
> to require the user to SO_BINDTODEVICE in these cases.
That's exactly the problem. People would test their code with one setup
where it works just fine, but then there will be a rare user of a
library used by some other framework or a lonely server where it starts
to fails for no apparent reason while "it worked before and nothing has
changed". It's more predictable if enforced.
I don't think we'd care about setup overhead one extra ioctl() here(?),
but with this option we'd need to be careful about not racing with
rebinding, if it's allowed.
--
Pavel Begunkov
Powered by blists - more mailing lists