netdev - Re: [PATCH net-next v6 06/12] net: homa: create homa_peer.h and homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmzY3UQg01ehF1gmnRom_8FzwQHLfKk7EQk1G=RnTzMJnA@mail.gmail.com>
Date: Mon, 27 Jan 2025 16:06:40 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org, 
	kuba@...nel.org
Subject: Re: [PATCH net-next v6 06/12] net: homa: create homa_peer.h and homa_peer.c

On Thu, Jan 23, 2025 at 9:45 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 1/15/25 7:59 PM, John Ousterhout wrote:
> > +/**
> > + * homa_peertab_get_peers() - Return information about all of the peers
> > + * currently known
> > + * @peertab:    The table to search for peers.
> > + * @num_peers:  Modified to hold the number of peers returned.
> > + * Return:      kmalloced array holding pointers to all known peers. The
> > + *           caller must free this. If there is an error, or if there
> > + *           are no peers, NULL is returned.
> > + */
> > +struct homa_peer **homa_peertab_get_peers(struct homa_peertab *peertab,
> > +                                       int *num_peers)
>
> Look like this function is unsed in the current series. Please don't
> introduce unused code.

Sorry about that. This patch series only contains about half of Homa's
full functionality. I didn't notice that this function had become
orphaned during the trimming process. I've removed it now (but not
before fixing the issues below).

> > +{
> > +     struct homa_peer **result;
> > +     struct hlist_node *next;
> > +     struct homa_peer *peer;
> > +     int i, count;
> > +
> > +     *num_peers = 0;
> > +     if (!peertab->buckets)
> > +             return NULL;
> > +
> > +     /* Figure out how many peers there are. */
> > +     count = 0;
> > +     for (i = 0; i < HOMA_PEERTAB_BUCKETS; i++) {
> > +             hlist_for_each_entry_safe(peer, next, &peertab->buckets[i],
> > +                                       peertab_links)
>
> No lock acquired, so others process could concurrently modify the list;
> hlist_for_each_entry_safe() is not the correct helper to use. You should
> probably use hlist_for_each_entry_rcu(), adding rcu protection. Assuming
> the thing is actually under an RCU schema, which is not entirely clear.

Looks like I misunderstood what "safe" means when I wrote this code.
As I understand it now, hlist_for_each_entry_safe is only "safe"
against deletion of the current entry by the thread that is iterating:
it is not safe against insertions or deletions by other threads, or
even deleting elements other than the current one. Is that correct?

I have switched to use hlist_for_each_entry_rcu instead, but this
raises questions. If I use hlist_for_each_entry_rcu, will I need to
use rcu_read_lock/unlock also in order to avoid complaints from the
RCU validator? Technically, I don't think rcu_read_lock and unlock
are necessary, because this code only needs protection against
concurrent modifications to the list structure, and I think that the
rcu iterators provide that. If I understand correctly, rcu_read_lock
and unlock are only needed to prevent an object from being deleted
while it is being used, but that can't happen here because peers are
not deleted. For now I have added calls to rcu_read_lock and unlock;
is there a way to annotate this usage so that I can skip the calls to
rcu_read_lock/unlock without complaints from the RCU validator?

> > +/**
> > + * homa_peertab_gc_dsts() - Invoked to free unused dst_entries, if it is
> > + * safe to do so.
> > + * @peertab:       The table in which to free entries.
> > + * @now:           Current time, in sched_clock() units; entries with expiration
> > + *                 dates no later than this will be freed. Specify ~0 to
> > + *                 free all entries.
> > + */
> > +void homa_peertab_gc_dsts(struct homa_peertab *peertab, __u64 now)
> > +{
>
> Apparently this is called under (and need) peertab lock, an annotation
> or a comment would be helpful.

I have now added a __must_hold(&peer_tab->write_lock) annotation to
this function.

> > +     hlist_for_each_entry_rcu(peer, &peertab->buckets[bucket],
> > +                              peertab_links) {
> > +             if (ipv6_addr_equal(&peer->addr, addr))
>
> The caller does not acquire the RCU read lock, so this looks buggy.

I have added rcu_read_lock/unlock calls, but I don't think they are
technically necessary, for the same reason discussed above.

> AFAICS UaF is not possible because peers are removed only by
> homa_peertab_destroy(), at unload time. That in turn looks
> dangerous/wrong. What about memory utilization for peers over time?
> apparently bucket list could grow in an unlimited way.

Correct: peers are only freed at unload time. I have deferred trying
to reclaim peer data earlier because it's unclear to me that that is
either necessary or good. Homa is intended for use only within a
particular datacenter so the number of peers is limited to the number
of hosts in the datacenter (100K?). The amount of information for each
peer is relatively small (about 300 bytes) so even in the worst case I
don't think it would be completely intolerable to have them all loaded
in memory at once. I would expect the actual number to be less than
that, due to locality of host-host access patterns. Anyhow, if
possible I'd prefer to defer the implementation of peer data until
there are measurements to indicate that it is necessary.  BTW, the
bucket array currently has 64K entries, so the bucket lists shouldn't
become very long even with 100K peers.

> [...]
> > +/**
> > + * homa_peer_lock_slow() - This function implements the slow path for
> > + * acquiring a peer's @unacked_lock. It is invoked when the lock isn't
> > + * immediately available. It waits for the lock, but also records statistics
> > + * about the waiting time.
> > + * @peer:    Peer to  lock.
> > + */
> > +void homa_peer_lock_slow(struct homa_peer *peer)
> > +     __acquires(&peer->ack_lock)
> > +{
> > +     spin_lock_bh(&peer->ack_lock);
>
> Is this just a placeholder for future changes?!? I don't see any stats
> update here, and currently homa_peer_lock() is really:
>
>         if (!spin_trylock_bh(&peer->ack_lock))
>                 spin_lock_bh(&peer->ack_lock);
>
> which does not make much sense to me. Either document this is going to
> change very soon (possibly even how and why) or use a plain spin_lock_bh()

The "full" Homa uses the "lock_slow" functions to report statistics on
lock conflicts all of Homa's metrics were removed for this patch
series, leaving the "lock_slow" functions as hollow shells. You aren't
the first reviewer to have been confused by this, so I will remove the
"lock_slow" functions for now.

> > +struct homa_peertab {
> > +     /**
> > +      * @write_lock: Synchronizes addition of new entries; not needed
> > +      * for lookups (RCU is used instead).
> > +      */
> > +     spinlock_t write_lock;
>
> This look looks potentially heavily contended on add, why don't you use a
> per bucket lock?

Peers aren't added very often so I don't expect contention for this
lock; if it turns out to be contended then I'll switch to per-bucket
locks.

> > +     /**
> > +      * @grantable_rpcs: Contains all homa_rpcs (both requests and
> > +      * responses) involving this peer whose msgins require (or required
> > +      * them in the past) and have not been fully received. The list is
> > +      * sorted in priority order (head has fewest bytes_remaining).
> > +      * Locked with homa->grantable_lock.
> > +      */
> > +     struct list_head grantable_rpcs;
>
> Apparently not used in this patch series. More field below with similar
> problem. Please introduce such fields in the same series that will
> actualy use them.

Oops, fixed. Several of the following fields are not even used in the
full Homa.... somehow they didn't get deleted once all of the usages
were removed.

-John-