lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 8 May 2024 12:30:07 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Mina Almasry <almasrymina@...gle.com>,
 Christoph Hellwig <hch@...radead.org>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-alpha@...r.kernel.org, linux-mips@...r.kernel.org,
 linux-parisc@...r.kernel.org, sparclinux@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
 bpf@...r.kernel.org, linux-kselftest@...r.kernel.org,
 linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Jonathan Corbet <corbet@....net>,
 Richard Henderson <richard.henderson@...aro.org>,
 Ivan Kokshaysky <ink@...assic.park.msu.ru>, Matt Turner
 <mattst88@...il.com>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
 "James E.J. Bottomley" <James.Bottomley@...senpartnership.com>,
 Helge Deller <deller@....de>, Andreas Larsson <andreas@...sler.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>,
 Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu
 <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Arnd Bergmann <arnd@...db.de>, Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
 <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, Steffen Klassert
 <steffen.klassert@...unet.com>, Herbert Xu <herbert@...dor.apana.org.au>,
 David Ahern <dsahern@...nel.org>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 Shuah Khan <shuah@...nel.org>, Sumit Semwal <sumit.semwal@...aro.org>,
 Christian König <christian.koenig@....com>,
 Amritha Nambiar <amritha.nambiar@...el.com>,
 Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
 Alexander Mikhalitsyn <alexander@...alicyn.com>,
 Kaiyuan Zhang <kaiyuanz@...gle.com>, Christian Brauner <brauner@...nel.org>,
 Simon Horman <horms@...nel.org>, David Howells <dhowells@...hat.com>,
 Florian Westphal <fw@...len.de>, Yunsheng Lin <linyunsheng@...wei.com>,
 Kuniyuki Iwashima <kuniyu@...zon.com>, Jens Axboe <axboe@...nel.dk>,
 Arseniy Krasnov <avkrasnov@...utedevices.com>,
 Aleksander Lobakin <aleksander.lobakin@...el.com>,
 Michael Lass <bevan@...co.net>, Jiri Pirko <jiri@...nulli.us>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Lorenzo Bianconi <lorenzo@...nel.org>,
 Richard Gobert <richardbgobert@...il.com>,
 Sridhar Samudrala <sridhar.samudrala@...el.com>,
 Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Johannes Berg <johannes.berg@...el.com>, Abel Wu <wuyun.abel@...edance.com>,
 Breno Leitao <leitao@...ian.org>, David Wei <dw@...idwei.uk>,
 Shailend Chand <shailend@...gle.com>,
 Harshitha Ramamurthy <hramamurthy@...gle.com>,
 Shakeel Butt <shakeel.butt@...ux.dev>, Jeroen de Borst
 <jeroendb@...gle.com>, Praveen Kaligineedi <pkaligineedi@...gle.com>
Subject: Re: [RFC PATCH net-next v8 02/14] net: page_pool: create hooks for
 custom page providers

On 5/8/24 00:32, Jason Gunthorpe wrote:
> On Tue, May 07, 2024 at 08:35:37PM +0100, Pavel Begunkov wrote:
>> On 5/7/24 18:56, Jason Gunthorpe wrote:
>>> On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote:
>>>> On 5/7/24 17:48, Jason Gunthorpe wrote:
>>>>> On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote:
>>>>>
>>>>>> 1. Align with devmem TCP to use udmabuf for your io_uring memory. I
>>>>>> think in the past you said it's a uapi you don't link but in the face
>>>>>> of this pushback you may want to reconsider.
>>>>>
>>>>> dmabuf does not force a uapi, you can acquire your pages however you
>>>>> want and wrap them up in a dmabuf. No uapi at all.
>>>>>
>>>>> The point is that dmabuf already provides ops that do basically what
>>>>> is needed here. We don't need ops calling ops just because dmabuf's
>>>>> ops are not understsood or not perfect. Fixup dmabuf.
>>>>
>>>> Those ops, for example, are used to efficiently return used buffers
>>>> back to the kernel, which is uapi, I don't see how dmabuf can be
>>>> fixed up to cover it.
>>>
>>> Sure, but that doesn't mean you can't use dma buf for the other parts
>>> of the flow. The per-page lifetime is a different topic than the
>>> refcounting and access of the entire bulk of memory.
>>
>> Ok, so if we're leaving uapi (and ops) and keep per page/sub-buffer as
>> is, the rest is resolving uptr -> pages, and passing it to page pool in
>> a convenient to page pool format (net_iov).
> 
> I'm not going to pretend to know about page pool details, but dmabuf
> is the way to get the bulk of pages into a pool within the net stack's
> allocator and keep that bulk properly refcounted while.> 
> An object like dmabuf is needed for the general case because there are
> not going to be per-page references or otherwise available.

They are already pinned, memory is owned by the provider, io_uring
in this case, and it should not be freed circumventing io_uring,
and at this stage calling release_pages() is not such a hassle,
especially comparing to introducing an additional object.

My question is how having an intermediary dmabuf benefits the net
stack or io_uring ? For now IMO it doesn't solve anything but adds
extra complexity. Adding dmabuf for the sake of adding dmabuf is
not a great choice.

> What you seem to want is to alter how the actual allocation flow works
> from that bulk of memory and delay the free. It seems like a different
For people who jumped here without looking what this patchset is
about, that's the entire point of the io_uring zero copy approach
as well as this set. Instead of using kernel private pages that you
have no other option but to copy/mmap (and then free), it hands
buffers to the user while using memory accessible/visible in some
way by the user.

That "delay free" is taking a reference while user is reading data
(slightly different for devmem tcp). And note, it's not a page/dmabuf
reference, kernel can forcibly take it back and release pages.

> topic to me, and honestly hacking into the allocator free function
> seems a bit weird..

Do you also think that DMA_BUF_IOCTL_SYNC is a weird hack, because
it "delays free" by pinning the dmabuf object and letting the user
read memory instead of copying it? I can find many examples

-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ