lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f853286-e463-b684-cc1e-405119528697@gmail.com>
Date:   Thu, 9 Nov 2023 16:07:15 +0000
From:   Edward Cree <ecree.xilinx@...il.com>
To:     Mina Almasry <almasrymina@...gle.com>,
        David Ahern <dsahern@...nel.org>
Cc:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Stanislav Fomichev <sdf@...gle.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-kselftest@...r.kernel.org, linux-media@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Arnd Bergmann <arnd@...db.de>, Shuah Khan <shuah@...nel.org>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Christian König <christian.koenig@....com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Jeroen de Borst <jeroendb@...gle.com>,
        Praveen Kaligineedi <pkaligineedi@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP

On 09/11/2023 02:39, Mina Almasry wrote:
> On Wed, Nov 8, 2023 at 7:36 AM Edward Cree <ecree.xilinx@...il.com> wrote:
>>  If not then surely the way to return a memory area
>>  in an io_uring idiom is just to post a new read sqe ('RX descriptor')
>>  pointing into it, rather than explicitly returning it with setsockopt.
> 
> We're interested in using this with regular TCP sockets, not
> necessarily io_uring.
Fair.  I just wanted to push against the suggestion upthread that "oh,
 since io_uring supports setsockopt() we can just ignore it and it'll
 all magically work later" (paraphrased).
If you can keep the "allocate buffers out of a devmem region" and "post
 RX descriptors built on those buffers" APIs separate (inside the
 kernel; obviously both triggered by a single call to the setsockopt()
 uAPI) that'll likely make things simpler for the io_uring interface I
 describe, which will only want the latter.

-ed

PS: Here's a crazy idea that I haven't thought through at all: what if
 you allow device memory to be mmap()ed into process address space
 (obviously with none of r/w/x because it's unreachable), so that your
 various uAPIs can just operate on pointers (e.g. the setsockopt
 becomes the madvise it's named after; recvmsg just uses or populates
 the iovec rather than needing a cmsg).  Then if future devices have
 their memory CXL accessible that can potentially be enabled with no
 change to the uAPI (userland just starts being able to access the
 region without faulting).
And you can maybe add a semantic flag to recvmsg saying "if you don't
 use all the buffers in my iovec, keep hold of the rest of them for
 future incoming traffic, and if I post new buffers with my next
 recvmsg, add those to the tail of the RXQ rather than replacing the
 ones you've got".  That way you can still have the "userland
 directly fills the RX ring" behaviour even with TCP sockets.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ