netdev - Re: [PATCH net-next v8 11/17] io_uring/zcrx: implement zerocopy receive pp memory provider

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <95e02ca4-4f0d-4f74-a882-6c975b345daa@gmail.com>
Date: Wed, 11 Dec 2024 14:42:43 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org,
 netdev@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
 Paolo Abeni <pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jesper Dangaard Brouer
 <hawk@...nel.org>, David Ahern <dsahern@...nel.org>,
 Mina Almasry <almasrymina@...gle.com>,
 Stanislav Fomichev <stfomichev@...il.com>, Joe Damato <jdamato@...tly.com>,
 Pedro Tammela <pctammela@...atatu.com>
Subject: Re: [PATCH net-next v8 11/17] io_uring/zcrx: implement zerocopy
 receive pp memory provider

On 12/11/24 00:24, Jakub Kicinski wrote:
> On Tue, 10 Dec 2024 04:45:23 +0000 Pavel Begunkov wrote:
>>> Can you say more about the IO_ZC_RX_UREF bias? net_iov is not the page
>>> struct, we can add more fields. In fact we have 8B of padding in it
>>> that can be allocated without growing the struct. So why play with
>>
>> I guess we can, though it's growing it for everyone not just
>> io_uring considering how indexing works, i.e. no embedding into
>> a larger struct.
> 
> Right but we literally have 8B of "padding". We only need 32b counter
> here, so there will still be 4B of unused space. Not to mention that
> net_iov is not cacheline aligned today. Space is not a concern.
> 
>>> biases? You can add a 32b atomic counter for how many refs have been
>>> handed out to the user.
>>
>> This set does it in a stupid way, but the bias allows to coalesce
>> operations with it into a single atomic. Regardless, it can be
>> placed separately, though we still need a good way to optimise
>> counting. Take a look at my reply with questions in the v7 thread,
>> I outlined what can work quite well in terms of performance but
>> needs a clear api for that from net/
> 
> I was thinking along the lines of transferring the ownership of
> the frags. But let's work on that as a follow up. Atomic add on

That's fine to leave it out for now and deal later, but what's
important for me when going through preliminary shittification of
the project is to have a way to optimise it after and a clear
understanding that it can't be left w/o it, and that there are
no strong opinions that would block it.

The current cache situation is too unfortunate, understandably so
with it being aliased to struct page. pp_ref_count is in the
same line with ->pp and others. Here an iov usually gets modified
by napi, then refcounted from syscall, after deferred skb put will
put it down back at napi context, and in some time after it gets
washed out from the cache, the user will finally return it back
to page pool.

> an exclusively owned cacheline is 2 cycles on AMD if I'm looking
> correctly.

Sounds too good to be true considering x86 implies a full barrier
for atomics. I wonder where the data comes from?

-- 
Pavel Begunkov