[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <728cebe2-6480-4b55-a6dd-858317810cff@app.fastmail.com>
Date: Sat, 21 Dec 2024 14:43:13 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "John Ousterhout" <ouster@...stanford.edu>
Cc: "Jakub Kicinski" <kuba@...nel.org>, Netdev <netdev@...r.kernel.org>,
"Paolo Abeni" <pabeni@...hat.com>, "Eric Dumazet" <edumazet@...gle.com>,
"Simon Horman" <horms@...nel.org>
Subject: Re: [PATCH net-next v4 01/12] inet: homa: define user-visible API for Homa
On Sat, Dec 21, 2024, at 00:42, John Ousterhout wrote:
> On Fri, Dec 20, 2024 at 1:13 PM Arnd Bergmann <arnd@...db.de> wrote:
>> Assuming this is actually meant as a persistent __user
>> pointer, I'm still unsure what this means if the socket is
>> available to more than one process, e.g. through a fork()
>> or explicit file descriptor passing, or if the original
>> process dies while there is still a transfer in progress.
>> I realize that there is a lot of information already out
>> there that I haven't all read, so this is probably explained
>> somewhere, but it would be nice to point to that documentation
>> somewhere near the code to clarify the corner cases.
>
> I hadn't considered this, but the buffering mechanism prevents the
> same socket from being shared across processes. I'm okay with that:
> I'm not sure that sharing between processes adds much value for Homa,
> and the performance benefit from the buffer mechanism is quite large.
> I will document this. Is there a way to prevent a socket from being
> shared across processes (e.g. can I set close-on-exec from within the
> kernel?) I don't think there is any risk to kernel integrity if the
> socket does end up shared; the worst that will happen is that the
> memory of one of the processes will get trashed because Homa will
> write to memory that isn't actually buffer space in that process.
It would definitely be nicer to ensure that it's only available
for a particular 'struct mm'. Setting O_CLOEXEC is probably not
be enough since this does not close the fd on a fork without exec,
and does not prevent the flag from being reset through fcntl().
Maybe see what io_uring() does to handle userspace pointers
here, I think the problem is quite similar there.
>> That probably also explains what type of memory the
>> __user buffer can point to, but I would like to make
>> sure that this has well-defined behavior e.g. if that
>> buffer is an mmap()ed file on NFS that was itself
>> mounted over a homa socket. Is there any guarantee that
>> this is either prohibited or is free of deadlocks and
>> recursion?
>
> Given the API incompatibilities between Homa and TCP, I don't think it
> is possible to have NFS mounted over a Homa socket. But you raise the
> issue of whether some kinds of addresses might not be suitable for
> Homa's buffer use this way. I don't know enough about the various
> possible kinds of memory to know what kinds of problems could occur.
> My assumption is that the buffer area will be a simple mmap()ed
> region. The only use Homa makes of the buffer address is to call
> import_ubuf with addresses in the buffer region, followed by
> skb_copy_datagram_iter with the resulting iov_iter.
Right, NFS was just an example, but there are other interesting
cases. You certainly have to deal with buffers in userspace
memory that are blocked indefinitely. Another interesting case
is memory that has additional constraints, e.g. the MMIO
space of a PCI device like a GPU, which may fault when writing
data into it, or which cannot be mapped into the DMA space
of a network device.
> Is there some way I can check the "kind" of memory behind the buffer
> pointer, so Homa could reject anything other than the simple case?
I don't think so. I still don't know what the exact constraints
are that you have here, but I suspect this would all be a lot
simpler if you could change the interface to not pass arbitrary
user addresses but instead have a single file descriptor that
backs the buffers, either by passing a tmpfs/hugetlbfs file into
the socket instead of a pointer, or by using mmap() on the
socket to map it into userspace like we do for packet sockets.
Arnd
Powered by blists - more mailing lists