[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8faebc20-2b79-4b5f-aed6-b403d5b0f33d@redhat.com>
Date: Thu, 23 Jan 2025 18:45:09 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: John Ousterhout <ouster@...stanford.edu>, netdev@...r.kernel.org
Cc: edumazet@...gle.com, horms@...nel.org, kuba@...nel.org
Subject: Re: [PATCH net-next v6 06/12] net: homa: create homa_peer.h and
homa_peer.c
On 1/15/25 7:59 PM, John Ousterhout wrote:
> +/**
> + * homa_peertab_get_peers() - Return information about all of the peers
> + * currently known
> + * @peertab: The table to search for peers.
> + * @num_peers: Modified to hold the number of peers returned.
> + * Return: kmalloced array holding pointers to all known peers. The
> + * caller must free this. If there is an error, or if there
> + * are no peers, NULL is returned.
> + */
> +struct homa_peer **homa_peertab_get_peers(struct homa_peertab *peertab,
> + int *num_peers)
Look like this function is unsed in the current series. Please don't
introduce unused code.
> +{
> + struct homa_peer **result;
> + struct hlist_node *next;
> + struct homa_peer *peer;
> + int i, count;
> +
> + *num_peers = 0;
> + if (!peertab->buckets)
> + return NULL;
> +
> + /* Figure out how many peers there are. */
> + count = 0;
> + for (i = 0; i < HOMA_PEERTAB_BUCKETS; i++) {
> + hlist_for_each_entry_safe(peer, next, &peertab->buckets[i],
> + peertab_links)
No lock acquired, so others process could concurrently modify the list;
hlist_for_each_entry_safe() is not the correct helper to use. You should
probably use hlist_for_each_entry_rcu(), adding rcu protection. Assuming
the thing is actually under an RCU schema, which is not entirely clear.
> + count++;
> + }
> +
> + if (count == 0)
> + return NULL;
> +
> + result = kmalloc_array(count, sizeof(peer), GFP_KERNEL);
> + if (!result)
> + return NULL;
> + *num_peers = count;
> + count = 0;
> + for (i = 0; i < HOMA_PEERTAB_BUCKETS; i++) {
> + hlist_for_each_entry_safe(peer, next, &peertab->buckets[i],
> + peertab_links) {
> + result[count] = peer;
> + count++;
> + }
> + }
> + return result;
> +}
> +
> +/**
> + * homa_peertab_gc_dsts() - Invoked to free unused dst_entries, if it is
> + * safe to do so.
> + * @peertab: The table in which to free entries.
> + * @now: Current time, in sched_clock() units; entries with expiration
> + * dates no later than this will be freed. Specify ~0 to
> + * free all entries.
> + */
> +void homa_peertab_gc_dsts(struct homa_peertab *peertab, __u64 now)
> +{
Apparently this is called under (and need) peertab lock, an annotation
or a comment would be helpful.
> + while (!list_empty(&peertab->dead_dsts)) {
> + struct homa_dead_dst *dead =
> + list_first_entry(&peertab->dead_dsts,
> + struct homa_dead_dst, dst_links);
> + if (dead->gc_time > now)
> + break;
> + dst_release(dead->dst);
> + list_del(&dead->dst_links);
> + kfree(dead);
> + }
> +}
> +
> +/**
> + * homa_peer_find() - Returns the peer associated with a given host; creates
> + * a new homa_peer if one doesn't already exist.
> + * @peertab: Peer table in which to perform lookup.
> + * @addr: Address of the desired host: IPv4 addresses are represented
> + * as IPv4-mapped IPv6 addresses.
> + * @inet: Socket that will be used for sending packets.
> + *
> + * Return: The peer associated with @addr, or a negative errno if an
> + * error occurred. The caller can retain this pointer
> + * indefinitely: peer entries are never deleted except in
> + * homa_peertab_destroy.
> + */
> +struct homa_peer *homa_peer_find(struct homa_peertab *peertab,
> + const struct in6_addr *addr,
> + struct inet_sock *inet)
> +{
> + /* Note: this function uses RCU operators to ensure safety even
> + * if a concurrent call is adding a new entry.
> + */
> + struct homa_peer *peer;
> + struct dst_entry *dst;
> +
> + __u32 bucket = hash_32((__force __u32)addr->in6_u.u6_addr32[0],
> + HOMA_PEERTAB_BUCKET_BITS);
> +
> + bucket ^= hash_32((__force __u32)addr->in6_u.u6_addr32[1],
> + HOMA_PEERTAB_BUCKET_BITS);
> + bucket ^= hash_32((__force __u32)addr->in6_u.u6_addr32[2],
> + HOMA_PEERTAB_BUCKET_BITS);
> + bucket ^= hash_32((__force __u32)addr->in6_u.u6_addr32[3],
> + HOMA_PEERTAB_BUCKET_BITS);
> + hlist_for_each_entry_rcu(peer, &peertab->buckets[bucket],
> + peertab_links) {
> + if (ipv6_addr_equal(&peer->addr, addr))
The caller does not acquire the RCU read lock, so this looks buggy.
AFAICS UaF is not possible because peers are removed only by
homa_peertab_destroy(), at unload time. That in turn looks
dangerous/wrong. What about memory utilization for peers over time?
apparently bucket list could grow in an unlimited way.
[...]
> +/**
> + * homa_peer_lock_slow() - This function implements the slow path for
> + * acquiring a peer's @unacked_lock. It is invoked when the lock isn't
> + * immediately available. It waits for the lock, but also records statistics
> + * about the waiting time.
> + * @peer: Peer to lock.
> + */
> +void homa_peer_lock_slow(struct homa_peer *peer)
> + __acquires(&peer->ack_lock)
> +{
> + spin_lock_bh(&peer->ack_lock);
Is this just a placeholder for future changes?!? I don't see any stats
update here, and currently homa_peer_lock() is really:
if (!spin_trylock_bh(&peer->ack_lock))
spin_lock_bh(&peer->ack_lock);
which does not make much sense to me. Either document this is going to
change very soon (possibly even how and why) or use a plain spin_lock_bh()
> +}
> +
> +/**
> + * homa_peer_add_ack() - Add a given RPC to the list of unacked
> + * RPCs for its server. Once this method has been invoked, it's safe
> + * to delete the RPC, since it will eventually be acked to the server.
> + * @rpc: Client RPC that has now completed.
> + */
> +void homa_peer_add_ack(struct homa_rpc *rpc)
> +{
> + struct homa_peer *peer = rpc->peer;
> + struct homa_ack_hdr ack;
> +
> + homa_peer_lock(peer);
> + if (peer->num_acks < HOMA_MAX_ACKS_PER_PKT) {
> + peer->acks[peer->num_acks].client_id = cpu_to_be64(rpc->id);
> + peer->acks[peer->num_acks].server_port = htons(rpc->dport);
> + peer->num_acks++;
> + homa_peer_unlock(peer);
> + return;
> + }
> +
> + /* The peer has filled up; send an ACK message to empty it. The
> + * RPC in the message header will also be considered ACKed.
> + */
> + memcpy(ack.acks, peer->acks, sizeof(peer->acks));
> + ack.num_acks = htons(peer->num_acks);
> + peer->num_acks = 0;
> + homa_peer_unlock(peer);
> + homa_xmit_control(ACK, &ack, sizeof(ack), rpc);
> +}
> +
> +/**
> + * homa_peer_get_acks() - Copy acks out of a peer, and remove them from the
> + * peer.
> + * @peer: Peer to check for possible unacked RPCs.
> + * @count: Maximum number of acks to return.
> + * @dst: The acks are copied to this location.
> + *
> + * Return: The number of acks extracted from the peer (<= count).
> + */
> +int homa_peer_get_acks(struct homa_peer *peer, int count, struct homa_ack *dst)
> +{
> + /* Don't waste time acquiring the lock if there are no ids available. */
> + if (peer->num_acks == 0)
> + return 0;
> +
> + homa_peer_lock(peer);
> +
> + if (count > peer->num_acks)
> + count = peer->num_acks;
> + memcpy(dst, &peer->acks[peer->num_acks - count],
> + count * sizeof(peer->acks[0]));
> + peer->num_acks -= count;
> +
> + homa_peer_unlock(peer);
> + return count;
> +}
> diff --git a/net/homa/homa_peer.h b/net/homa/homa_peer.h
> new file mode 100644
> index 000000000000..556aeda49656
> --- /dev/null
> +++ b/net/homa/homa_peer.h
> @@ -0,0 +1,233 @@
> +/* SPDX-License-Identifier: BSD-2-Clause */
> +
> +/* This file contains definitions related to managing peers (homa_peer
> + * and homa_peertab).
> + */
> +
> +#ifndef _HOMA_PEER_H
> +#define _HOMA_PEER_H
> +
> +#include "homa_wire.h"
> +#include "homa_sock.h"
> +
> +struct homa_rpc;
> +
> +/**
> + * struct homa_dead_dst - Used to retain dst_entries that are no longer
> + * needed, until it is safe to delete them (I'm not confident that the RCU
> + * mechanism will be safe for these: the reference count could get incremented
> + * after it's on the RCU list?).
> + */
> +struct homa_dead_dst {
> + /** @dst: Entry that is no longer used by a struct homa_peer. */
> + struct dst_entry *dst;
> +
> + /**
> + * @gc_time: Time (in units of sched_clock()) when it is safe
> + * to free @dst.
> + */
> + __u64 gc_time;
> +
> + /** @dst_links: Used to link together entries in peertab->dead_dsts. */
> + struct list_head dst_links;
> +};
> +
> +/**
> + * define HOMA_PEERTAB_BUCKET_BITS - Number of bits in the bucket index for a
> + * homa_peertab. Should be large enough to hold an entry for every server
> + * in a datacenter without long hash chains.
> + */
> +#define HOMA_PEERTAB_BUCKET_BITS 16
> +
> +/** define HOME_PEERTAB_BUCKETS - Number of buckets in a homa_peertab. */
> +#define HOMA_PEERTAB_BUCKETS BIT(HOMA_PEERTAB_BUCKET_BITS)
> +
> +/**
> + * struct homa_peertab - A hash table that maps from IPv6 addresses
> + * to homa_peer objects. IPv4 entries are encapsulated as IPv6 addresses.
> + * Entries are gradually added to this table, but they are never removed
> + * except when the entire table is deleted. We can't safely delete because
> + * results returned by homa_peer_find may be retained indefinitely.
> + *
> + * This table is managed exclusively by homa_peertab.c, using RCU to
> + * permit efficient lookups.
> + */
> +struct homa_peertab {
> + /**
> + * @write_lock: Synchronizes addition of new entries; not needed
> + * for lookups (RCU is used instead).
> + */
> + spinlock_t write_lock;
This look looks potentially havily contented on add, why don't you use a
per bucket lock?
> +
> + /**
> + * @dead_dsts: List of dst_entries that are waiting to be deleted.
> + * Hold @write_lock when manipulating.
> + */
> + struct list_head dead_dsts;
> +
> + /**
> + * @buckets: Pointer to heads of chains of homa_peers for each bucket.
> + * Malloc-ed, and must eventually be freed. NULL means this structure
> + * has not been initialized.
> + */
> + struct hlist_head *buckets;
> +};
> +
> +/**
> + * struct homa_peer - One of these objects exists for each machine that we
> + * have communicated with (either as client or server).
> + */
> +struct homa_peer {
> + /**
> + * @addr: IPv6 address for the machine (IPv4 addresses are stored
> + * as IPv4-mapped IPv6 addresses).
> + */
> + struct in6_addr addr;
> +
> + /** @flow: Addressing info needed to send packets. */
> + struct flowi flow;
> +
> + /**
> + * @dst: Used to route packets to this peer; we own a reference
> + * to this, which we must eventually release.
> + */
> + struct dst_entry *dst;
> +
> + /**
> + * @grantable_rpcs: Contains all homa_rpcs (both requests and
> + * responses) involving this peer whose msgins require (or required
> + * them in the past) and have not been fully received. The list is
> + * sorted in priority order (head has fewest bytes_remaining).
> + * Locked with homa->grantable_lock.
> + */
> + struct list_head grantable_rpcs;
Apparently not used in this patch series. More field below with similar
problem. Please introduce such fields in the same series that will
actualy use them.
/P
Powered by blists - more mailing lists