netdev - Re: [PATCH net-next v6 06/12] net: homa: create homa_peer.h and homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDRa3+Rz7jxZFAE79=TygBGu-b4=NQS7xU7WK3V9PuzLw@mail.gmail.com>
Date: Tue, 28 Jan 2025 08:32:06 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: John Ousterhout <ouster@...stanford.edu>
Cc: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, edumazet@...gle.com, 
	horms@...nel.org, kuba@...nel.org
Subject: Re: [PATCH net-next v6 06/12] net: homa: create homa_peer.h and homa_peer.c

On Tue, Jan 28, 2025 at 8:16 AM John Ousterhout <ouster@...stanford.edu> wrote:
>
> On Thu, Jan 23, 2025 at 9:45 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On 1/15/25 7:59 PM, John Ousterhout wrote:
> > > +/**
> > > + * homa_peertab_get_peers() - Return information about all of the peers
> > > + * currently known
> > > + * @peertab:    The table to search for peers.
> > > + * @num_peers:  Modified to hold the number of peers returned.
> > > + * Return:      kmalloced array holding pointers to all known peers. The
> > > + *           caller must free this. If there is an error, or if there
> > > + *           are no peers, NULL is returned.
> > > + */
> > > +struct homa_peer **homa_peertab_get_peers(struct homa_peertab *peertab,
> > > +                                       int *num_peers)
> >
> > Look like this function is unsed in the current series. Please don't
> > introduce unused code.
>
> Sorry about that. This patch series only contains about half of Homa's
> full functionality. I didn't notice that this function had become
> orphaned during the trimming process. I've removed it now (but not
> before fixing the issues below).
>
> > > +{
> > > +     struct homa_peer **result;
> > > +     struct hlist_node *next;
> > > +     struct homa_peer *peer;
> > > +     int i, count;
> > > +
> > > +     *num_peers = 0;
> > > +     if (!peertab->buckets)
> > > +             return NULL;
> > > +
> > > +     /* Figure out how many peers there are. */
> > > +     count = 0;
> > > +     for (i = 0; i < HOMA_PEERTAB_BUCKETS; i++) {
> > > +             hlist_for_each_entry_safe(peer, next, &peertab->buckets[i],
> > > +                                       peertab_links)
> >
> > No lock acquired, so others process could concurrently modify the list;
> > hlist_for_each_entry_safe() is not the correct helper to use. You should
> > probably use hlist_for_each_entry_rcu(), adding rcu protection. Assuming
> > the thing is actually under an RCU schema, which is not entirely clear.
>
> Looks like I misunderstood what "safe" means when I wrote this code.
> As I understand it now, hlist_for_each_entry_safe is only "safe"
> against deletion of the current entry by the thread that is iterating:
> it is not safe against insertions or deletions by other threads, or
> even deleting elements other than the current one. Is that correct?

I'm not Paolo. From what I've known, your understanding is correct.
RCU mechanism guarantees that the deletion process is safe.

>
> I have switched to use hlist_for_each_entry_rcu instead, but this
> raises questions. If I use hlist_for_each_entry_rcu, will I need to
> use rcu_read_lock/unlock also in order to avoid complaints from the

rcu_read_lock/unlock() should be used correspondingly. If without, the
deletion would not be safe.

> RCU validator? Technically, I don't think rcu_read_lock and unlock
> are necessary, because this code only needs protection against
> concurrent modifications to the list structure, and I think that the
> rcu iterators provide that. If I understand correctly, rcu_read_lock

RCU would not be helpful if multiple threads are trying to write in
the same buckets at the same time. spin_lock() is a common approach, I
would recommend. Please see __inet_lookup_established() as a good
example.

Thanks,
Jason

> and unlock are only needed to prevent an object from being deleted
> while it is being used, but that can't happen here because peers are
> not deleted. For now I have added calls to rcu_read_lock and unlock;
> is there a way to annotate this usage so that I can skip the calls to
> rcu_read_lock/unlock without complaints from the RCU validator?
>
> > > +/**
> > > + * homa_peertab_gc_dsts() - Invoked to free unused dst_entries, if it is
> > > + * safe to do so.
> > > + * @peertab:       The table in which to free entries.
> > > + * @now:           Current time, in sched_clock() units; entries with expiration
> > > + *                 dates no later than this will be freed. Specify ~0 to
> > > + *                 free all entries.
> > > + */
> > > +void homa_peertab_gc_dsts(struct homa_peertab *peertab, __u64 now)
> > > +{
> >
> > Apparently this is called under (and need) peertab lock, an annotation
> > or a comment would be helpful.
>
> I have now added a __must_hold(&peer_tab->write_lock) annotation to
> this function.
>
> > > +     hlist_for_each_entry_rcu(peer, &peertab->buckets[bucket],
> > > +                              peertab_links) {
> > > +             if (ipv6_addr_equal(&peer->addr, addr))
> >
> > The caller does not acquire the RCU read lock, so this looks buggy.
>
> I have added rcu_read_lock/unlock calls, but I don't think they are
> technically necessary, for the same reason discussed above.
>
> > AFAICS UaF is not possible because peers are removed only by
> > homa_peertab_destroy(), at unload time. That in turn looks
> > dangerous/wrong. What about memory utilization for peers over time?
> > apparently bucket list could grow in an unlimited way.
>
> Correct: peers are only freed at unload time. I have deferred trying
> to reclaim peer data earlier because it's unclear to me that that is
> either necessary or good. Homa is intended for use only within a
> particular datacenter so the number of peers is limited to the number
> of hosts in the datacenter (100K?). The amount of information for each
> peer is relatively small (about 300 bytes) so even in the worst case I
> don't think it would be completely intolerable to have them all loaded
> in memory at once. I would expect the actual number to be less than
> that, due to locality of host-host access patterns. Anyhow, if
> possible I'd prefer to defer the implementation of peer data until
> there are measurements to indicate that it is necessary.  BTW, the
> bucket array currently has 64K entries, so the bucket lists shouldn't
> become very long even with 100K peers.
>
> > [...]
> > > +/**
> > > + * homa_peer_lock_slow() - This function implements the slow path for
> > > + * acquiring a peer's @unacked_lock. It is invoked when the lock isn't
> > > + * immediately available. It waits for the lock, but also records statistics
> > > + * about the waiting time.
> > > + * @peer:    Peer to  lock.
> > > + */
> > > +void homa_peer_lock_slow(struct homa_peer *peer)
> > > +     __acquires(&peer->ack_lock)
> > > +{
> > > +     spin_lock_bh(&peer->ack_lock);
> >
> > Is this just a placeholder for future changes?!? I don't see any stats
> > update here, and currently homa_peer_lock() is really:
> >
> >         if (!spin_trylock_bh(&peer->ack_lock))
> >                 spin_lock_bh(&peer->ack_lock);
> >
> > which does not make much sense to me. Either document this is going to
> > change very soon (possibly even how and why) or use a plain spin_lock_bh()
>
> The "full" Homa uses the "lock_slow" functions to report statistics on
> lock conflicts all of Homa's metrics were removed for this patch
> series, leaving the "lock_slow" functions as hollow shells. You aren't
> the first reviewer to have been confused by this, so I will remove the
> "lock_slow" functions for now.
>
> > > +struct homa_peertab {
> > > +     /**
> > > +      * @write_lock: Synchronizes addition of new entries; not needed
> > > +      * for lookups (RCU is used instead).
> > > +      */
> > > +     spinlock_t write_lock;
> >
> > This look looks potentially heavily contended on add, why don't you use a
> > per bucket lock?
>
> Peers aren't added very often so I don't expect contention for this
> lock; if it turns out to be contended then I'll switch to per-bucket
> locks.
>
> > > +     /**
> > > +      * @grantable_rpcs: Contains all homa_rpcs (both requests and
> > > +      * responses) involving this peer whose msgins require (or required
> > > +      * them in the past) and have not been fully received. The list is
> > > +      * sorted in priority order (head has fewest bytes_remaining).
> > > +      * Locked with homa->grantable_lock.
> > > +      */
> > > +     struct list_head grantable_rpcs;
> >
> > Apparently not used in this patch series. More field below with similar
> > problem. Please introduce such fields in the same series that will
> > actualy use them.
>
> Oops, fixed. Several of the following fields are not even used in the
> full Homa.... somehow they didn't get deleted once all of the usages
> were removed.
>
> -John-
>