[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGXJAmxKM5a95uhBwbmm1Z427=bGyZhcCUopycLMTEfc4dHnew@mail.gmail.com>
Date: Sun, 13 Nov 2022 21:37:24 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Andrew Lunn <andrew@...n.ch>
Cc: Jiri Pirko <jiri@...nulli.us>,
Stephen Hemminger <stephen@...workplumber.org>,
netdev@...r.kernel.org
Subject: Re: Upstream Homa?
On Sun, Nov 13, 2022 at 12:38 PM Andrew Lunn <andrew@...n.ch> wrote:
>
> On Sun, Nov 13, 2022 at 12:10:22PM -0800, John Ousterhout wrote:
> > On Sun, Nov 13, 2022 at 9:10 AM Andrew Lunn <andrew@...n.ch> wrote:
> > >
> > > > Homa implements RPCs rather than streams like TCP or messages like
> > > > UDP. An RPC consists of a request message sent from client to server,
> > > > followed by a response message from server back to client. This requires
> > > > additional information in the API beyond what is provided in the arguments to
> > > > sendto and recvfrom. For example, when sending a request message, the
> > > > kernel returns an RPC identifier back to the application; when waiting for
> > > > a response, the application can specify that it wants to receive the reply for
> > > > a specific RPC identifier (or, it can specify that it will accept any
> > > > reply, or any
> > > > request, or both).
> > >
> > > This sounds like the ancillary data you can pass to sendmsg(). I've
> > > not checked the code, it might be the current plumbing is only into to
> > > the kernel, but i don't see why you cannot extend it to also allow
> > > data to be passed back to user space. If this is new functionality,
> > > maybe add a new flags argument to control it.
> > >
> > > recvmsg() also has ancillary data.
> >
> > Whoah! I'd never noticed the msg_control and msg_controllen fields before.
> > These may be sufficient to do everything Homa needs. Thanks for pointing
> > this out.
>
> Is zero copy also required? https://lwn.net/Articles/726917/ talks
> about this. But rather than doing the transmit complete notification
> via MSG_ERRORQUEUE, maybe you could make it part of the ancillary data
> for a later message? That could save you some system calls? Or is the
> latency low enough that the RPC reply acts an implicitly indication
> the transmit buffer can be recycled?
>
> If your aim is to offload Homa to the NIC, it seems like zero copy is
> something you want, so even if you are not implementing it now, you
> probably should consider what the uAPI looks like.
I know that zero copy is all the rage these days, but I've become somewhat of
a skeptic. We spent quite a bit of time in the RAMCloud project
implementing zero
copy (and we were using kernel-bypass NICs, which make it about as efficient as
possible); we found that it is very difficult to get a real performance benefit.
Managing the space so you know when you can reclaim it adds a lot of complexity
and overhead. My current thinking is that zero copy only makes sense when you
have really large blocks of data. I'm inclined to let others
experiment with zero-copy
for a while and see if they can achieve sustainable benefits over a
meaningful range
of operating conditions.
-John-
Powered by blists - more mailing lists