netdev - Re: [PATCH net-next v15 09/15] net: homa: create homa_rpc.h and homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmydvaiY+0RNXLU-hdh1tYcTvUrvcuxWZTxsHbmWeTRSxw@mail.gmail.com>
Date: Mon, 1 Sep 2025 13:10:55 -0700
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org, 
	kuba@...nel.org
Subject: Re: [PATCH net-next v15 09/15] net: homa: create homa_rpc.h and homa_rpc.c

On Tue, Aug 26, 2025 at 4:31 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 8/18/25 10:55 PM, John Ousterhout wrote:
> > +/**
> > + * homa_rpc_reap() - Invoked to release resources associated with dead
> > + * RPCs for a given socket.
> > + * @hsk:      Homa socket that may contain dead RPCs. Must not be locked by the
> > + *            caller; this function will lock and release.
> > + * @reap_all: False means do a small chunk of work; there may still be
> > + *            unreaped RPCs on return. True means reap all dead RPCs for
> > + *            hsk.  Will busy-wait if reaping has been disabled for some RPCs.
> > + *
> > + * Return: A return value of 0 means that we ran out of work to do; calling
> > + *         again will do no work (there could be unreaped RPCs, but if so,
> > + *         they cannot currently be reaped).  A value greater than zero means
> > + *         there is still more reaping work to be done.
> > + */
> > +int homa_rpc_reap(struct homa_sock *hsk, bool reap_all)
> > +{
> > +     /* RPC Reaping Strategy:
> > +      *
> > +      * (Note: there are references to this comment elsewhere in the
> > +      * Homa code)
> > +      *
> > +      * Most of the cost of reaping comes from freeing sk_buffs; this can be
> > +      * quite expensive for RPCs with long messages.
> > +      *
> > +      * The natural time to reap is when homa_rpc_end is invoked to
> > +      * terminate an RPC, but this doesn't work for two reasons. First,
> > +      * there may be outstanding references to the RPC; it cannot be reaped
> > +      * until all of those references have been released. Second, reaping
> > +      * is potentially expensive and RPC termination could occur in
> > +      * homa_softirq when there are short messages waiting to be processed.
> > +      * Taking time to reap a long RPC could result in significant delays
> > +      * for subsequent short RPCs.
> > +      *
> > +      * Thus Homa doesn't reap immediately in homa_rpc_end. Instead, dead
> > +      * RPCs are queued up and reaping occurs in this function, which is
> > +      * invoked later when it is less likely to impact latency. The
> > +      * challenge is to do this so that (a) we don't allow large numbers of
> > +      * dead RPCs to accumulate and (b) we minimize the impact of reaping
> > +      * on latency.
> > +      *
> > +      * The primary place where homa_rpc_reap is invoked is when threads
> > +      * are waiting for incoming messages. The thread has nothing else to
> > +      * do (it may even be polling for input), so reaping can be performed
> > +      * with no latency impact on the application.  However, if a machine
> > +      * is overloaded then it may never wait, so this mechanism isn't always
> > +      * sufficient.
> > +      *
> > +      * Homa now reaps in two other places, if reaping while waiting for
> > +      * messages isn't adequate:
> > +      * 1. If too may dead skbs accumulate, then homa_timer will call
> > +      *    homa_rpc_reap.
> > +      * 2. If this timer thread cannot keep up with all the reaping to be
> > +      *    done then as a last resort homa_dispatch_pkts will reap in small
> > +      *    increments (a few sk_buffs or RPCs) for every incoming batch
> > +      *    of packets . This is undesirable because it will impact Homa's
> > +      *    performance.
> > +      *
> > +      * During the introduction of homa_pools for managing input
> > +      * buffers, freeing of packets for incoming messages was moved to
> > +      * homa_copy_to_user under the assumption that this code wouldn't be
> > +      * on the critical path. However, there is evidence that with
> > +      * fast networks (e.g. 100 Gbps) copying to user space is the
> > +      * bottleneck for incoming messages, and packet freeing takes about
> > +      * 20-25% of the total time in homa_copy_to_user. So, it may eventually
> > +      * be desirable to remove packet freeing out of homa_copy_to_user.
>
> See skb_attempt_defer_free()

I wasn't previously aware of this. It looks useful, but unfortunately
its symbol isn't currently EXPORTed so Homa can't use it. I submitted
a patch to export that symbol, but that patch was rejected because the
patch didn't also include a use of the symbol.

I'm going to wait until this series is accepted, then submit a smaller
patch that adds the EXPORT and uses it in Homa (or maybe I'll wait
until I upstream Homa's GRO support, as Eric suggested).

> > +      */
> > +#define BATCH_MAX 20
> > +     struct homa_rpc *rpcs[BATCH_MAX];
> > +     struct sk_buff *skbs[BATCH_MAX];
>
> A lot of bytes on the stack, and a quite large batch. You should probaly
> decrease it.

I have reduced the batch size to 10. Note also that this is a
"near-leaf" function, so it should be safe for it to have a larger
footprint than Homa functions that invoke the IP/driver stack, which
presumably takes a lot of stack space.

> Also it still feel suspect the need for just another tx free strategy on
> top of the several existing caches.

I wasn't able to identify an existing cache mechanism that could meet
Homa's needs (and given the association Homa introduces between skb's
and RPCs, which are Homa-specific, it seems unlikely that any existing
mechanism would work for Homa). But, if you have something in mind
that you think might work for Homa, let me know and I'll take a look.

> > +             homa_sock_wakeup_wmem(hsk);
>
> Here num_rpcs can be zero, and you can have spurius wake-ups

I agree that num_rpcs can be zero, but homa_sock_wakeup_wmem won't
actually perform a wakeup unless (a) there are tasks waiting and (b)
there is available memory. So I don't see how there can be a spurious
wakeup. Is there something I'm missing?

> > +static inline void homa_rpc_hold(struct homa_rpc *rpc)
> > +{
> > +     atomic_inc(&rpc->refs);
>
> `refs` should be a reference_t, since is uses as such.

Done.

-John-