[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e7c2c21-12ba-41c1-92c4-f32a3906f3ee@gmail.com>
Date: Fri, 8 Dec 2023 20:28:15 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Stanislav Fomichev <sdf@...gle.com>
Cc: Mina Almasry <almasrymina@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Arnd Bergmann
<arnd@...db.de>, David Ahern <dsahern@...nel.org>,
Shuah Khan <shuah@...nel.org>, Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Shakeel Butt <shakeelb@...gle.com>, Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP
On 11/6/23 22:55, Willem de Bruijn wrote:
> On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev <sdf@...gle.com> wrote:
>>
>> On 11/06, Willem de Bruijn wrote:
>>>>> IMHO, we need a better UAPI to receive the tokens and give them back to
>>>>> the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done,
>>>>> but look dated and hacky :-(
>>>>>
>>>>> We should either do some kind of user/kernel shared memory queue to
>>>>> receive/return the tokens (similar to what Jonathan was doing in his
>>>>> proposal?)
>>>>
>>>> I'll take a look at Jonathan's proposal, sorry, I'm not immediately
>>>> familiar but I wanted to respond :-) But is the suggestion here to
>>>> build a new kernel-user communication channel primitive for the
>>>> purpose of passing the information in the devmem cmsg? IMHO that seems
>>>> like an overkill. Why add 100-200 lines of code to the kernel to add
>>>> something that can already be done with existing primitives? I don't
>>>> see anything concretely wrong with cmsg & setsockopt approach, and if
>>>> we switch to something I'd prefer to switch to an existing primitive
>>>> for simplicity?
>>>>
>>>> The only other existing primitive to pass data outside of the linear
>>>> buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that
>>>> preferred? Any other suggestions or existing primitives I'm not aware
>>>> of?
>>>>
>>>>> or bite the bullet and switch to io_uring.
>>>>>
>>>>
>>>> IMO io_uring & socket support are orthogonal, and one doesn't preclude
>>>> the other. As you know we like to use sockets and I believe there are
>>>> issues with io_uring adoption at Google that I'm not familiar with
>>>> (and could be wrong). I'm interested in exploring io_uring support as
>>>> a follow up but I think David Wei will be interested in io_uring
>>>> support as well anyway.
>>>
>>> I also disagree that we need to replace a standard socket interface
>>> with something "faster", in quotes.
>>>
>>> This interface is not the bottleneck to the target workload.
>>>
>>> Replacing the synchronous sockets interface with something more
>>> performant for workloads where it is, is an orthogonal challenge.
>>> However we do that, I think that traditional sockets should continue
>>> to be supported.
>>>
>>> The feature may already even work with io_uring, as both recvmsg with
>>> cmsg and setsockopt have io_uring support now.
>>
>> I'm not really concerned with faster. I would prefer something cleaner :-)
>>
>> Or maybe we should just have it documented. With some kind of path
>> towards beautiful world where we can create dynamic queues..
>
> I suppose we just disagree on the elegance of the API.
>
> The concise notification API returns tokens as a range for
> compression, encoding as two 32-bit unsigned integers start + length.
> It allows for even further batching by returning multiple such ranges
> in a single call.
FWIW, nothing prevents io_uring from compressing ranges. The io_uring
zc RFC returns {offset, size} as well, though at the moment the would
lie in the same page.
> This is analogous to the MSG_ZEROCOPY notification mechanism from
> kernel to user.
>
> The synchronous socket syscall interface can be replaced by something
> asynchronous like io_uring. This already works today? Whatever
If you mean async io_uring recv, it does work. In short, internally
it polls the socket and then calls sock_recvmsg(). There is also a
feature that would make it return back to polling after sock_recvmsg()
and loop like this.
> asynchronous ring-based API would be selected, io_uring or otherwise,
> I think the concise notification encoding would remain as is.
>
> Since this is an operation on a socket, I find a setsockopt the
> fitting interface.
--
Pavel Begunkov
Powered by blists - more mailing lists