netdev - Re: [PATCH v1 06/15] net: page_pool: add ->scrub mem provider callback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2771e23a-2f4b-43c7-9d2a-3fa6aad26d53@gmail.com>
Date: Fri, 1 Nov 2024 21:38:27 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org,
 netdev@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>, David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH v1 06/15] net: page_pool: add ->scrub mem provider
 callback

On 11/1/24 19:24, Mina Almasry wrote:
> On Fri, Nov 1, 2024 at 11:34 AM Pavel Begunkov <asml.silence@...il.com> wrote:
...
>>> Huh, interesting. For devmem TCP we bind a region of memory to the
>>> queue once, and after that we can create N connections all reusing the
>>> same memory region. Is that not the case for io_uring? There are no
>>
>> Hmm, I think we already discussed the same question before. Yes, it
>> does indeed support arbitrary number of connections. For what I was
>> saying above, the devmem TCP analogy would be attaching buffers to the
>> netlink socket instead of a tcp socket (that new xarray you added) when
>> you give it to user space. Then, you can close the connection after a
>> receive and the buffer you've got would still be alive.
>>
> 
> Ah, I see. You're making a tradeoff here. You leave the buffers alive
> after each connection so the userspace can still use them if it wishes
> but they are of course unavailable for other connections.
> 
> But in our case (and I'm guessing yours) the process that will set up
> the io_uring memory provider/RSS/flow steering will be a different
> process from the one that sends/receive data, no? Because the former
> requires CAP_NET_ADMIN privileges while the latter will not. If they
> are 2 different processes, what happens when the latter process doing
> the send/receive crashes? Does the memory stay unavailable until the
> CAP_NET_ADMIN process exits? Wouldn't it be better to tie the lifetime
> of the buffers of the connection? Sure, the buffers will become

That's the tradeoff google is willing to do in the framework,
which is fine, but it's not without cost, e.g. you need to
store/erase into the xarray, and it's a design choice in other
aspects, like you can't release the page pool if the socket you
got a buffer from is still alive but the net_iov hasn't been
returned.

> unavailable after the connection is closed, but at least you don't
> 'leak' memory on send/receive process crashes.
> 
> Unless of course you're saying that only CAP_NET_ADMIN processes will

The user can pass io_uring instance itself

> run io_rcrx connections. Then they can do their own mp setup/RSS/flow
> steering and there is no concern when the process crashes because
> everything will be cleaned up. But that's a big limitation to put on
> the usage of the feature no?

-- 
Pavel Begunkov