[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce4f62a8-1114-47b9-af08-51656e08c2b5@redhat.com>
Date: Tue, 26 Aug 2025 11:05:51 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: John Ousterhout <ouster@...stanford.edu>, netdev@...r.kernel.org
Cc: edumazet@...gle.com, horms@...nel.org, kuba@...nel.org
Subject: Re: [PATCH net-next v15 03/15] net: homa: create shared Homa header
files
On 8/18/25 10:55 PM, John Ousterhout wrote:
> +/**
> + * struct homa_net - Contains Homa information that is specific to a
> + * particular network namespace.
> + */
> +struct homa_net {
> + /** @net: Network namespace corresponding to this structure. */
> + struct net *net;
> +
> + /** @homa: Global Homa information. */
> + struct homa *homa;
It's not clear why the above 2 fields are needed. You could access
directly the global struct homa instance, and 'struct net' is usually
available when struct home_net is avail.
> +/**
> + * is_homa_pkt() - Return true if @skb is a Homa packet, false otherwise.
> + * @skb: Packet buffer to check.
> + * Return: see above.
> + */
> +static inline bool is_homa_pkt(struct sk_buff *skb)
> +{
> + int protocol;
> +
> + /* If the network header hasn't been created yet, assume it's a
> + * Homa packet (Homa never generates any non-Homa packets).
> + */
> + if (skb->network_header == 0)
> + return true;
> + protocol = (skb_is_ipv6(skb)) ? ipv6_hdr(skb)->nexthdr :
> + ip_hdr(skb)->protocol;
> + return protocol == IPPROTO_HOMA;
> +}
This helper is apparently unused in this series, just drop it / add it
later
> +#define UNIT_LOG(...)
> +#define UNIT_HOOK(...)
Also apparently unused.
> +extern unsigned int homa_net_id;
> +
> +/**
> + * homa_net_from_net() - Return the struct homa_net associated with a particular
> + * struct net.
> + * @net: Get the Homa data for this net namespace.
> + * Return: see above.
> + */
> +static inline struct homa_net *homa_net_from_net(struct net *net)
The customary name for this kind of helper is homa_net()
> +{
> + return (struct homa_net *)net_generic(net, homa_net_id);
> +}
> +
> +/**
> + * homa_from_skb() - Return the struct homa associated with a particular
> + * sk_buff.
> + * @skb: Get the struct homa for this packet buffer.
> + * Return: see above.
> + */
> +static inline struct homa *homa_from_skb(struct sk_buff *skb)
> +{
> + struct homa_net *hnet;
> +
> + hnet = net_generic(dev_net(skb->dev), homa_net_id);
> + return hnet->homa;
You can implement this using homa_net_from_skb(), avoiding some code
duplication
> +}
> +
> +/**
> + * homa_net_from_skb() - Return the struct homa_net associated with a particular
> + * sk_buff.
> + * @skb: Get the struct homa for this packet buffer.
> + * Return: see above.
> + */
> +static inline struct homa_net *homa_net_from_skb(struct sk_buff *skb)
> +{
> + struct homa_net *hnet;
> +
> + hnet = net_generic(dev_net(skb->dev), homa_net_id);
> + return hnet;
You can implement this using homa_net(), avoid some code duplication.
> +}
> +
> +/**
> + * homa_clock() - Return a fine-grain clock value that is monotonic and
> + * consistent across cores.
> + * Return: see above.
> + */
> +static inline u64 homa_clock(void)
> +{
> + /* As of May 2025 there does not appear to be a portable API that
> + * meets Homa's needs:
> + * - The Intel X86 TSC works well but is not portable.
> + * - sched_clock() does not guarantee monotonicity or consistency.
> + * - ktime_get_mono_fast_ns and ktime_get_raw_fast_ns are very slow
> + * (27 ns to read, vs 8 ns for TSC)
> + * Thus we use a hybrid approach that uses TSC (via get_cycles) where
> + * available (which should be just about everywhere Homa runs).
> + */
> +#ifdef CONFIG_X86_TSC
> + return get_cycles();
> +#else
> + return ktime_get_mono_fast_ns();
> +#endif /* CONFIG_X86_TSC */
> +}
ktime_get*() variant are fast enough to allow e.g. pktgen deals with
millions of packets x seconds. Both tsc() and ktime_get_mono_fast_ns()
suffer of various inconsistencies which will cause the most unexpected
issues in the most dangerous situation. I strongly advice against this
early optimization.
> +/**
> + * homa_usecs_to_cycles() - Convert from units of microseconds to units of
> + * homa_clock().
> + * @usecs: A time measurement in microseconds
> + * Return: The time in homa_clock() units corresponding to @usecs.
> + */
> +static inline u64 homa_usecs_to_cycles(u64 usecs)
> +{
> + u64 tmp;
> +
> + tmp = usecs * homa_clock_khz();
> + do_div(tmp, 1000);
> + return tmp;
> +}
Apparently not used in this series.
FWIW do_div() would likely be much more costly than timestamp fetching.
> +
> +/* Homa Locking Strategy:
> + *
> + * (Note: this documentation is referenced in several other places in the
> + * Homa code)
> + *
> + * In the Linux TCP/IP stack the primary locking mechanism is a sleep-lock
> + * per socket. However, per-socket locks aren't adequate for Homa, because
> + * sockets are "larger" in Homa. In TCP, a socket corresponds to a single
> + * connection between two peers; an application can have hundreds or
> + * thousands of sockets open at once, so per-socket locks leave lots of
> + * opportunities for concurrency. With Homa, a single socket can be used for
> + * communicating with any number of peers, so there will typically be just
> + * one socket per thread. As a result, a single Homa socket must support many
> + * concurrent RPCs efficiently, and a per-socket lock would create a bottleneck
> + * (Homa tried this approach initially).
> + *
> + * Thus, the primary locks used in Homa spinlocks at RPC granularity. This
> + * allows operations on different RPCs for the same socket to proceed
> + * concurrently. Homa also has socket locks (which are spinlocks different
> + * from the official socket sleep-locks) but these are used much less
> + * frequently than RPC locks.
> + *
> + * Lock Ordering:
> + *
> + * There are several other locks in Homa besides RPC locks, all of which
> + * are spinlocks. When multiple locks are held, they must be acquired in a
> + * consistent order in order to prevent deadlock. Here are the rules for Homa:
> + * 1. Except for RPC and socket locks, all locks should be considered
> + * "leaf" locks: don't acquire other locks while holding them.
> + * 2. The lock order is:
> + * * RPC lock
> + * * Socket lock
> + * * Other lock
> + * 3. It is not safe to wait on an RPC lock while holding any other lock.
> + * 4. It is safe to wait on a socket lock while holding an RPC lock, but
> + * not while holding any other lock.
The last 2 points are not needed: are obviously implied by the previous
ones.
> + *
> + * It may seem surprising that RPC locks are acquired *before* socket locks,
> + * but this is essential for high performance. Homa has been designed so that
> + * many common operations (such as processing input packets) can be performed
> + * while holding only an RPC lock; this allows operations on different RPCs
> + * to proceed in parallel. Only a few operations, such as handing off an
> + * incoming message to a waiting thread, require the socket lock. If socket
> + * locks had to be acquired first, any operation that might eventually need
> + * the socket lock would have to acquire it before the RPC lock, which would
> + * severely restrict concurrency.
FWIW, I think the above scheme can offer good performances if and only
if any operation requiring the socket lock is slow-path: multiple
RPCs/cores can content the same socket lock experiencing false sharing
and cache contention/misses and will hit performances badly.
If the operations requiring the socket lock are slow-path, the
RPC/socket lock order should be irrelevant for performances.
[...]
> +static inline void homa_skb_get(struct sk_buff *skb, void *dest, int offset,
> + int length)
> +{
> + memcpy(dest, skb_transport_header(skb) + offset, length);
> +}
Apparently unused.
/P
Powered by blists - more mailing lists