netdev - Re: [PATCH net-next v6 08/12] net: homa: create homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmya3xU69ghKO10SZz4sh48CyBgBsF7AaV1OOCRyVPr0Nw@mail.gmail.com>
Date: Wed, 29 Jan 2025 16:41:41 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org, 
	kuba@...nel.org
Subject: Re: [PATCH net-next v6 08/12] net: homa: create homa_incoming.c

On Fri, Jan 24, 2025 at 12:31 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> OoO will cause additional allocation? this feels like DoS prone.
>
> > +             }
> > +             rpc->msgin.recv_end = end;
> > +             goto keep;
> > +     }
> > +
> > +     /* Must now check to see if the packet fills in part or all of
> > +      * an existing gap.
> > +      */
> > +     list_for_each_entry_safe(gap, dummy, &rpc->msgin.gaps, links) {
>
> Linear search for OoO has proven to be subject to serious dos issue. You
> should instead use a (rb-)tree to handle OoO packets.

I have been assuming that DoS won't be a major issue for Homa because
it's intended for use only in datacenters (if there are antagonistic
parties, they will be isolated from each other by networking
hardware). Is this a bad assumption?

> > +
> > +             /* Packet is in the middle of the gap; must split the gap. */
> > +             gap2 = homa_gap_new(&gap->links, gap->start, start);
> > +             if (!gap2) {
> > +                     pr_err("Homa couldn't allocate gap for split: insufficient memory\n");
> > +                     goto discard;
> > +             }
> > +             gap2->time = gap->time;
> > +             gap->start = end;
> > +             goto keep;
> > +     }
> > +
> > +discard:
> > +     kfree_skb(skb);
> > +     return;
> > +
> > +keep:
> > +     __skb_queue_tail(&rpc->msgin.packets, skb);
>
> Here 'msgin.packets' is apparently under RCP lock protection, but
> elsewhere - in homa_rpc_reap() - the list is apparently protected by
> it's own lock.

What are you referring to by "its own lock?" As far as I know there is
no lock specific to msgin.packets. Normally everything in a homa_rpc
is protected by the RPC lock, and that's the case for homa_add_packet
above. By the time homa_rpc_reap sees an RPC
it has been marked dead and removed from all lists, so no-one else
will try to mutate it and there's no need for synchronization over its
internals. The only remaining problem is that there could still be
outstanding references to the RPC, whose owners haven't yet discovered
that it's dead and dropped their references. The protect_count on the
socket is used to detect these situations.

> Also it looks like there is no memory accounting at all, and SO_RCVBUF
> setting are just ignored.

Homa doesn't yet have comprehensive memory accounting, but there is a
limit on buffer space for incoming messages. Instead of SO_RCVBUF,
applications control the amount of receive buffer space by controlling
the size of the buffer pool they provide to Homa with the
SO_HOMA_RCVBUF socket option.

> > +/**
> > + * homa_dispatch_pkts() - Top-level function that processes a batch of packets,
> > + * all related to the same RPC.
> > + * @skb:       First packet in the batch, linked through skb->next.
> > + * @homa:      Overall information about the Homa transport.
> > + */
> > +void homa_dispatch_pkts(struct sk_buff *skb, struct homa *homa)
>
> I see I haven't mentioned the following so far, but you should move the
> struct homa to a pernet subsystem.

Sorry for my ignorance, but I'm not familiar with the concept of "a
pernet subsystem". What's the best way for me to learn more about
this?

> > +{
> > +#define MAX_ACKS 10
> > +     const struct in6_addr saddr = skb_canonical_ipv6_saddr(skb);
> > +     struct homa_data_hdr *h = (struct homa_data_hdr *)skb->data;
> > +     __u64 id = homa_local_id(h->common.sender_id);
> > +     int dport = ntohs(h->common.dport);
> > +
> > +     /* Used to collect acks from data packets so we can process them
> > +      * all at the end (can't process them inline because that may
> > +      * require locking conflicting RPCs). If we run out of space just
> > +      * ignore the extra acks; they'll be regenerated later through the
> > +      * explicit mechanism.
> > +      */
> > +     struct homa_ack acks[MAX_ACKS];
> > +     struct homa_rpc *rpc = NULL;
> > +     struct homa_sock *hsk;
> > +     struct sk_buff *next;
> > +     int num_acks = 0;
> > +
> > +     /* Find the appropriate socket.*/
> > +     hsk = homa_sock_find(homa->port_map, dport);
>
> This needs RCU protection

Yep. I have reworked homa_sock_find so that it uses RCU protection
internally and then takes a reference on the socket before returning.
Callers have to eventually release the reference, but they shouldn't
need to deal with RCU anymore.

> > +
> > +             /* Find and lock the RPC if we haven't already done so. */
> > +             if (!rpc) {
> > +                     if (!homa_is_client(id)) {
> > +                             /* We are the server for this RPC. */
> > +                             if (h->common.type == DATA) {
> > +                                     int created;
> > +
> > +                                     /* Create a new RPC if one doesn't
> > +                                      * already exist.
> > +                                      */
> > +                                     rpc = homa_rpc_new_server(hsk, &saddr,
> > +                                                               h, &created);
>
> It looks like a buggy or malicious client could force server RPC
> allocation to any _client_ ?!?

I'm not sure what you mean by "force server RPC allocation to any
_client_"; can you give a bit more detail?

> > +                                     if (IS_ERR(rpc)) {
> > +                                             pr_warn("homa_pkt_dispatch couldn't create server rpc: error %lu",
> > +                                                     -PTR_ERR(rpc));
> > +                                             rpc = NULL;
> > +                                             goto discard;
> > +                                     }
> > +                             } else {
> > +                                     rpc = homa_find_server_rpc(hsk, &saddr,
> > +                                                                id);
> > +                             }
> > +                     } else {
> > +                             rpc = homa_find_client_rpc(hsk, id);
>
> Both the client and the server lookup require a contended lock; The
> lookup could/should be lockless, and the the lock could/should be
> asserted only on the relevant RPC.

I think we've discussed this issue in response to earlier comments:
the only lock acquired during lookup is the hash table bucket lock for
the desired RPC, and the hash table has enough buckets to avoid
serious contention.

>
> > +             case UNKNOWN:
> > +                     homa_unknown_pkt(skb, rpc);
>
> It's sort of unexpected that the protocol explicitly defines the unknown
> packet type, and handles it differently form undefined types.

Maybe the name UNKNOWN is causing confusion? An UNKNOWN packet is sent
when an endpoint receives a RESEND packet for an RPC that is unknown
to it. The term UNKNOWN refers to an unknown RPC, as opposed to an
unrecognized packet type.

> > +                     break;
> > +             case BUSY:
> > +                     /* Nothing to do for these packets except reset
> > +                      * silent_ticks, which happened above.
> > +                      */
> > +                     goto discard;
> > +             case NEED_ACK:
> > +                     homa_need_ack_pkt(skb, hsk, rpc);
> > +                     break;
> > +             case ACK:
> > +                     homa_ack_pkt(skb, hsk, rpc);
> > +                     rpc = NULL;
> > +
> > +                     /* It isn't safe to process more packets once we've
> > +                      * released the RPC lock (this should never happen).
> > +                      */
> > +                     while (next) {
> > +                             WARN_ONCE(next, "%s found extra packets after AC<\n",
> > +                                       __func__);
>
> It looks like the above WARN could be triggered by an unexpected traffic
> pattern generate from the client. If so, you should avoid the WARN() and
> instead use e.g. some mib counter.

The real problem here is with homa_ack_pkt returning with the RPC lock
released. I've fixed that now, so the check is no longer necessary
(I've deleted it).

> > +
> > +     if (skb_queue_len(&rpc->msgin.packets) != 0 &&
> > +         !(atomic_read(&rpc->flags) & RPC_PKTS_READY)) {
> > +             atomic_or(RPC_PKTS_READY, &rpc->flags);
> > +             homa_sock_lock(rpc->hsk, "homa_data_pkt");
> > +             homa_rpc_handoff(rpc);
> > +             homa_sock_unlock(rpc->hsk);
>
> It looks like you tried to enforce the following lock acquiring order:
> rpc lock
> socket lock
> which is IMHO quite unnatural, as the socket has a wider scope than the
> RPC. In practice the locking schema is quite complex and hard to follow.
> I think (wild guess) that inverting the lock order would simplify the
> locking schema significantly.

This locking order is necessary for Homa.  Because a single socket can
be used for many concurrent RPCs to many peers, it doesn't work to
acquire the socket lock for every operation: it would suffer terrible
contention (I tried this in the earliest versions of Homa and it was a
bottleneck under high load). Thus the RPC lock is the primary lock in
Homa, not the socket lock. Many operations can be completed without
ever holding the socket lock, which reduces contention for the socket
lock.

In TCP a busy app will spread itself over a lot of sockets, so the
socket locks are less likely to be contended.

> > +void homa_ack_pkt(struct sk_buff *skb, struct homa_sock *hsk,
> > +               struct homa_rpc *rpc)
> > +     __releases(rpc->bucket_lock)
> > +{
> > +     const struct in6_addr saddr = skb_canonical_ipv6_saddr(skb);
> > +     struct homa_ack_hdr *h = (struct homa_ack_hdr *)skb->data;
> > +     int i, count;
> > +
> > +     if (rpc) {
> > +             homa_rpc_free(rpc);
> > +             homa_rpc_unlock(rpc);
>
> Another point that makes IMHO the locking schema hard to follow is the
> fact that many non-locking-related functions acquires or release some
> lock internally. The code would be much more easy to follow if you could
> pair the lock and unlock as much as possible inside the same code block.

I agree. There are places where a function has to release a lock
internally for various reasons, but it should reacquire the lock
before returning to preserve symmetry. There are places where
functions release a lock without reacquiring it, but that's a bad idea
I'd like to fix (homa_ack_pkt is one example). One of the reasons for
this was that once an RPC lock is released the RPC could go away, so
it wasn't safe to attempt to relock it. I have added new methods
homa_rpc_hold() and homa_rpc_put() so that it's possible to take a
reference count on an RPC to keep it around while the lock is
released, so the lock can safely be reacquired later. This is how I
fixed the homa_ack_pkt problem you pointed out above. If you see any
other places with this asymmetry, let me know and I'll fix them also.
The new methods also provide a consistent and simple solution to
several other problems that had been solved in an ad hoc way.

It would be even better if a function never had to release a lock
internally, but so far I haven't figured out how to do that. If you
have ideas I'd like to hear them.

> > +
> > +     if (id != 0) {
> > +             if ((atomic_read(&rpc->flags) & RPC_PKTS_READY) || rpc->error)
> > +                     goto claim_rpc;
> > +             rpc->interest = interest;
> > +             interest->reg_rpc = rpc;
> > +             homa_rpc_unlock(rpc);
>
> With the current schema you should release the hsh socket lock before
> releasing the rpc one.

Normally I would agree, but that won't work here: the RPC is no longer
of interest, so it needs to be unlocked, but we need to keep the
socket lock through the code that follows. This is safe (out-of-order
lock acquisition can cause deadlocks, but the order of lock releasing
doesn't matter except aesthetically).

> > +struct homa_rpc *homa_wait_for_message(struct homa_sock *hsk, int flags,
> > +                                    __u64 id)
> > +     __acquires(&rpc->bucket_lock)
> > +{
> > +     ...
> > + }

> The amount of custom code to wait is concerning. Why can't you build
> around sk_wait_event()?

I agree that it's complicated. sk_wait_event can't be used because
Homa allows different threads to wait on partially-overlapping sets of
RPCs. For example, one thread can wait for a specific RPC to complete,
while another thread waits for *any* RPC to complete. Thus a given RPC
completion may not apply to all of the waiting threads. Here's a link
to a man page that describes the recvmsg API for Homa:

https://homa-transport.atlassian.net/wiki/spaces/HOMA/overview

That said, I have never been very happy with this API (and its
consequences for the waiting code). I've occasionally thought there
must be a better alternative but never came up with anything I liked.
However, your comment forced me to think about this some more, and I
now think I have a better idea for how to do waiting, which will
eliminate overlapping waits and allow sk_wait_event to be used instead
of the "interest" mechanism  that's currently implemented. This will
be an API change, but if I'm going to do it I think I should do it
now, before upstreaming. So I will do that.

-John-