lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b96b1602-6f76-4624-91f0-68d4f43756ce@gmail.com>
Date: Wed, 9 Oct 2024 20:43:08 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Mina Almasry <almasrymina@...gle.com>, Jens Axboe <axboe@...nel.dk>
Cc: David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org,
 netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jesper Dangaard Brouer
 <hawk@...nel.org>, David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx

On 10/9/24 20:32, Mina Almasry wrote:
> On Wed, Oct 9, 2024 at 9:57 AM Jens Axboe <axboe@...nel.dk> wrote:
>>
>> On 10/9/24 10:55 AM, Mina Almasry wrote:
>>> On Mon, Oct 7, 2024 at 3:16?PM David Wei <dw@...idwei.uk> wrote:
>>>>
>>>> This patchset adds support for zero copy rx into userspace pages using
>>>> io_uring, eliminating a kernel to user copy.
>>>>
>>>> We configure a page pool that a driver uses to fill a hw rx queue to
>>>> hand out user pages instead of kernel pages. Any data that ends up
>>>> hitting this hw rx queue will thus be dma'd into userspace memory
>>>> directly, without needing to be bounced through kernel memory. 'Reading'
>>>> data out of a socket instead becomes a _notification_ mechanism, where
>>>> the kernel tells userspace where the data is. The overall approach is
>>>> similar to the devmem TCP proposal.
>>>>
>>>> This relies on hw header/data split, flow steering and RSS to ensure
>>>> packet headers remain in kernel memory and only desired flows hit a hw
>>>> rx queue configured for zero copy. Configuring this is outside of the
>>>> scope of this patchset.
>>>>
>>>> We share netdev core infra with devmem TCP. The main difference is that
>>>> io_uring is used for the uAPI and the lifetime of all objects are bound
>>>> to an io_uring instance.
>>>
>>> I've been thinking about this a bit, and I hope this feedback isn't
>>> too late, but I think your work may be useful for users not using
>>> io_uring. I.e. zero copy to host memory that is not dependent on page
>>> aligned MSS sizing. I.e. AF_XDP zerocopy but using the TCP stack.
>>
>> Not David, but come on, let's please get this moving forward. It's been
>> stuck behind dependencies for seemingly forever, which are finally
>> resolved.
> 
> Part of the reason this has been stuck behind dependencies for so long
> is because the dependency took the time to implement things very
> generically (memory providers, net_iovs) and provided you with the
> primitives that enable your work. And dealt with nacks in this area
> you now don't have to deal with.

And that's well appreciated, but I completely share Jens' sentiment.
Is there anything like uapi concerns that prevents it to be
implemented after / separately? I'd say that for io_uring users
it's nice to have the API done the io_uring way regardless of the
socket API option, so at the very least it would fork on the completion
format and that thing would need to have a different ring/etc.

>> I don't think this is a reasonable ask at all for this
>> patchset. If you want to work on that after the fact, then that's
>> certainly an option.
> 
> I think this work is extensible to sockets and the implementation need
> not be heavily tied to io_uring; yes at least leaving things open for
> a socket extension to be done easier in the future would be good, IMO

And as far as I can tell there is already a socket API allowing
all that called devmem TCP :) Might need slight improvement on
the registration side unless dmabuf wrapped user pages are good
enough.

> I'll look at the series more closely to see if I actually have any
> concrete feedback along these lines. I hope you're open to some of it
> :-)

-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ