netdev - Re: [PATCH net-next v6 08/12] net: homa: create homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGXJAmyNPhA-6L0jv8AT9_xaxM81k+8nD5H+wtj=UN84PB_KnA@mail.gmail.com>
Date: Mon, 3 Feb 2025 15:33:49 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Netdev <netdev@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>, 
	Simon Horman <horms@...nel.org>, Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net-next v6 08/12] net: homa: create homa_incoming.c

On Mon, Feb 3, 2025 at 1:12 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> >>>> Also it looks like there is no memory accounting at all, and SO_RCVBUF
> >>>> setting are just ignored.
> >>>
> >>> Homa doesn't yet have comprehensive memory accounting, but there is a
> >>> limit on buffer space for incoming messages. Instead of SO_RCVBUF,
> >>> applications control the amount of receive buffer space by controlling
> >>> the size of the buffer pool they provide to Homa with the
> >>> SO_HOMA_RCVBUF socket option.
> >>
> >> Ignoring SO_RCVBUF (and net.core.rmem_* sysctls) is both unexpected and
> >> dangerous (a single application may consume unbounded amount of system
> >> memory). Also what about the TX side? I don't see any limit at all there.
> >
> > An application cannot consume unbounded system memory on the RX side
> > (in fact it consumes almost none). When packets arrive, their data is
> > immediately transferred to a buffer region in user memory provided by
> > the application (using the facilities in homa_pool.c). Skb's are
> > occupied only long enough to make this transfer, and it happens even
> > if there is no pending recv* kernel call. The size of the buffer
> > region is limited by the application, and the application must provide
> > a region via SO_HOMA_RCVBUF.
>
> I don't see where/how the SO_HOMA_RCVBUF max value is somehow bounded?!?
> It looks like the user-space could pick an arbitrary large value for it.

That's right; is there anything to be gained by limiting it? This is
simply mmapped memory in the user address space. Aren't applications
allowed to allocate as much memory as they like? If so, why shouldn't
they be able to use that memory for incoming buffers if they choose?

> > Given this, there's no need for SO_RCVBUF
> > (and I don't see why a different limit would be specified via
> > SO_RCVBUF than the one already provided via SO_HOMA_RCVBUF).
> > I agree that this is different from TCP, but Homa is different from TCP in
> > lots of ways.
> >
> > There is currently no accounting or control on the TX side. I agree
> > that this needs to be implemented at some point, but if possible I'd
> > prefer to defer this until more of Homa has been upstreamed. For
> > example, this current patch doesn't include any sysctl support, which
> > would be needed as part of accounting/control (the support is part of
> > the GitHub repo, it's just not in this patch series).
>
> SO_RCVBUF and SO_SNDBUF are expected to apply to any kind of socket,
> see man 7 sockets. Exceptions should be at least documented, but we need
> some way to limit memory usage in both directions.

The expectations around these limits are based on an unstated (and
probably unconscious) assumption of a TCP-like streaming protocol.
RPCs are different. For example, there is no one value of rmem_default
or rmem_max that will work for both TCP and Homa. On my system, these
values are both around 200 KB, which seems fine for TCP, but that's
not even enough for a single full-size RPC in Homa, and Homa apps need
to have several active RPCs at a time. Thus it doesn't make sense to
use SO_RCVBUF and SO_SNDBUF for both Homa and TCP; their needs are too
different.

> Fine tuning controls and sysctls could land later, but the basic
> constraints should IMHO be there from the beginning.

OK. I think that SO_HOMA_RCVBUF takes care of RX buffer space. For TX,
what's the simplest scheme that you would be comfortable with? For
example, if I cap the number of outstanding RPCs per socket, will that
be enough for now?

> Side note: if you use per RPC lock, and you know that the later one is a
> _different_ RPC, there will be no need for unlocking (and LOCKDEP will
> be happy with a "_nested" annotation).

This risks deadlock if some other thread decides to do things in the
other order.


-John-