lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2aa9c139-eee8-c707-6e62-5415c26c2a1a@gmail.com>
Date: Tue, 14 Nov 2023 16:09:51 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: David Ahern <dsahern@...nel.org>, Mina Almasry <almasrymina@...gle.com>,
 David Wei <dw@...idwei.uk>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arch@...r.kernel.org, linux-kselftest@...r.kernel.org,
 linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>, Arnd Bergmann
 <arnd@...db.de>, Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 Shuah Khan <shuah@...nel.org>, Sumit Semwal <sumit.semwal@...aro.org>,
 Christian König <christian.koenig@....com>,
 Shakeel Butt <shakeelb@...gle.com>, Jeroen de Borst <jeroendb@...gle.com>,
 Praveen Kaligineedi <pkaligineedi@...gle.com>,
 Willem de Bruijn <willemb@...gle.com>, Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator

On 11/11/23 17:19, David Ahern wrote:
> On 11/10/23 7:26 AM, Pavel Begunkov wrote:
>> On 11/7/23 23:03, Mina Almasry wrote:
>>> On Tue, Nov 7, 2023 at 2:55 PM David Ahern <dsahern@...nel.org> wrote:
>>>>
>>>> On 11/7/23 3:10 PM, Mina Almasry wrote:
>>>>> On Mon, Nov 6, 2023 at 3:44 PM David Ahern <dsahern@...nel.org> wrote:
>>>>>>
>>>>>> On 11/5/23 7:44 PM, Mina Almasry wrote:
>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>> index eeeda849115c..1c351c138a5b 100644
>>>>>>> --- a/include/linux/netdevice.h
>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding {
>>>>>>>    };
>>>>>>>
>>>>>>>    #ifdef CONFIG_DMA_SHARED_BUFFER
>>>>>>> +struct page_pool_iov *
>>>>>>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding);
>>>>>>> +void netdev_free_devmem(struct page_pool_iov *ppiov);
>>>>>>
>>>>>> netdev_{alloc,free}_dmabuf?
>>>>>>
>>>>>
>>>>> Can do.
>>>>>
>>>>>> I say that because a dmabuf can be host memory, at least I am not
>>>>>> aware
>>>>>> of a restriction that a dmabuf is device memory.
>>>>>>
>>>>>
>>>>> In my limited experience dma-buf is generally device memory, and
>>>>> that's really its use case. CONFIG_UDMABUF is a driver that mocks
>>>>> dma-buf with a memfd which I think is used for testing. But I can do
>>>>> the rename, it's more clear anyway, I think.
>>>>
>>>> config UDMABUF
>>>>           bool "userspace dmabuf misc driver"
>>>>           default n
>>>>           depends on DMA_SHARED_BUFFER
>>>>           depends on MEMFD_CREATE || COMPILE_TEST
>>>>           help
>>>>             A driver to let userspace turn memfd regions into dma-bufs.
>>>>             Qemu can use this to create host dmabufs for guest
>>>> framebuffers.
>>>>
>>>>
>>>> Qemu is just a userspace process; it is no way a special one.
>>>>
>>>> Treating host memory as a dmabuf should radically simplify the io_uring
>>>> extension of this set.
>>>
>>> I agree actually, and I was about to make that comment to David Wei's
>>> series once I have the time.
>>>
>>> David, your io_uring RX zerocopy proposal actually works with devmem
>>> TCP, if you're inclined to do that instead, what you'd do roughly is
>>> (I think):
>> That would be a Frankenstein's monster api with no good reason for it.
> 
> It brings a consistent API from a networking perspective.
> 
> io_uring should not need to be in the page pool and memory management
> business. Have you or David coded up the re-use of the socket APIs with
> dmabuf to see how much smaller it makes the io_uring change - or even
> walked through from a theoretical perspective?

Yes, we did the mental exercise, which is why we're converting to pp.
I don't see many opportunities for reuse for the main data path,
potentially apart from using the iov format instead of pages.

If the goal is to minimise the amount of code, it can mimic the tcp
devmem api with netlink, ioctl-ish buffer return, but that'd be a
pretty bad api for io_uring, overly complicated and limiting
optimisation options. If not, then we have to do some buffer
management in io_uring, and I don't see anything wrong with that. It
shouldn't be a burden for networking if all that extra code is
contained in io_uring and only exposed via pp ops and following
the rules.

-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ