netdev - Re: [PATCH v1 11/15] io_uring/zcrx: implement zerocopy receive pp memory provider

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6254de3f-c44f-4090-9ba7-7e69d04a9ba0@gmail.com>
Date: Fri, 11 Oct 2024 02:49:17 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org,
 netdev@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>, David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH v1 11/15] io_uring/zcrx: implement zerocopy receive pp
 memory provider

On 10/11/24 01:32, Mina Almasry wrote:
> On Thu, Oct 10, 2024 at 2:22 PM Pavel Begunkov <asml.silence@...il.com> wrote:
>>
>>> page_pool. To make matters worse, the bypass is only there if the
>>> netmems are returned from io_uring, and not bypassed when the netmems
>>> are returned from driver/tcp stack. I'm guessing if you reused the
>>> page_pool recycling in the io_uring return path then it would remove
>>> the need for your provider to implement its own recycling for the
>>> io_uring return case.
>>>
>>> Is letting providers bypass and override the page_pool's recycling in
>>> some code paths OK? IMO, no. A maintainer will make the judgement call
>>
>> Mina, frankly, that's nonsense. If we extend the same logic,
>> devmem overrides page allocation rules with callbacks, devmem
>> overrides and violates page pool buffer lifetimes by extending
>> it to user space, devmem violates and overrides the page pool
>> object lifetime by binding buffers to sockets. And all of it
>> I'd rather name extends and enhances to fit in the devmem use
>> case.
>>
>>> and speak authoritatively here and I will follow, but I do think it's
>>> a (much) worse design.
>>
>> Sure, I have a completely opposite opinion, that's a much
>> better approach than returning through a syscall, but I will
>> agree with you that ultimately the maintainers will say if
>> that's acceptable for the networking or not.
>>
> 
> Right, I'm not suggesting that you return the pages through a syscall.
> That will add syscall overhead when it's better not to have that
> especially in io_uring context. Devmem TCP needed a syscall because I
> couldn't figure out a non-syscall way with sockets for the userspace
> to tell the kernel that it's done with some netmems. You do not need
> to follow that at all. Sorry if I made it seem like so.
> 
> However, I'm suggesting that when io_uring figures out that the
> userspace is done with a netmem, that you feed that netmem back to the
> pp, and utilize the pp's recycling, rather than adding your own
> recycling in the provider.

I should spell it out somewhere in commits, the difference is that we
let the page pool to pull buffers instead of having a syscall to push
like devmem TCP does. With pushing, you'll be doing it from some task
context, and it'll need to find a way back into the page pool, via ptr
ring or with the opportunistic optimisations napi_pp_put_page() provides.
And if you do it this way, the function is very useful.

With pulling though, returning already happens from within the page
pool's allocation path, just in the right context that doesn't need
any additional locking / sync to access page pool's napi/bh protected
caches/etc.. That's why it has a potential to be faster, and why
optimisation wise napi_pp_put_page() doesn't make sense for this
case, i.e. no need to jump through hoops of finding how to transfer
a buffer to the page pool's context because we're already in there.

>  From your commit message:
> 
> "we extend the lifetime by recycling buffers only after the user space
> acknowledges that it's done processing the data via the refill queue"
> 
> It seems to me that you get some signal from the userspace that data

You don't even need to signal it, the page pool will take buffers
when it needs to allocate memory.

> is ready to be reuse via that refill queue (whatever it is, very
> sorry, I'm not that familiar with io_uring). My suggestion here is
> when the userspace tells you that a netmem is ready for reuse (however
> it does that), that you feed that page back to the pp via something
> like napi_pp_put_page() or page_pool_put_page_bulk() if that makes
> sense to you. FWIW I'm trying to look through your code to understand
> what that refill queue is and where - if anywhere - it may be possible
> to feed pages back to the pp, rather than directly to the provider.

-- 
Pavel Begunkov