netdev - Re: [PATCH net-next v8 05/15] net: homa: create homa_peer.h and homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d52465a2-f857-4a2b-8d4e-4d30384b6247@redhat.com>
Date: Thu, 8 May 2025 10:46:16 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: John Ousterhout <ouster@...stanford.edu>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org,
 kuba@...nel.org
Subject: Re: [PATCH net-next v8 05/15] net: homa: create homa_peer.h and
 homa_peer.c

On 5/7/25 6:11 PM, John Ousterhout wrote:
> On Mon, May 5, 2025 at 4:06 AM Paolo Abeni <pabeni@...hat.com> wrote:
> 
>> On 5/3/25 1:37 AM, John Ousterhout wrote:
>> [...]
>>> +{
>>> +     /* Note: when we return, the object must be initialized so it's
>>> +      * safe to call homa_peertab_destroy, even if this function returns
>>> +      * an error.
>>> +      */
>>> +     int i;
>>> +
>>> +     spin_lock_init(&peertab->write_lock);
>>> +     INIT_LIST_HEAD(&peertab->dead_dsts);
>>> +     peertab->buckets = vmalloc(HOMA_PEERTAB_BUCKETS *
>>> +                                sizeof(*peertab->buckets));
>>
>> This struct looks way too big to be allocated on per netns basis. You
>> should use a global table and include the netns in the lookup key.
> 
> Are there likely to be lots of netns's in a system? 

Yes. In fact a relevant performance metric is the time to create and
destroy thousands of them.

> I thought I read
> someplace that a hardware NIC must belong exclusively to a single
> netns, so from that I assumed there couldn't be more than a few
> netns's. Can there be virtual NICs, leading to lots of netns's?

Yes, veth devices a pretty ubiquitous in containerization setups

> Can
> you give me a ballpark number for how many netns's there might be in a
> system with "lots" of them? This will be useful in making design
> tradeoffs.

You should consider at least 1K as a target number, but a large system
should just work with 10K or more.

>>> +     /* No existing entry; create a new one.
>>> +      *
>>> +      * Note: after we acquire the lock, we have to check again to
>>> +      * make sure the entry still doesn't exist (it might have been
>>> +      * created by a concurrent invocation of this function).
>>> +      */
>>> +     spin_lock_bh(&peertab->write_lock);
>>> +     hlist_for_each_entry(peer, &peertab->buckets[bucket],
>>> +                          peertab_links) {
>>> +             if (ipv6_addr_equal(&peer->addr, addr))
>>> +                     goto done;
>>> +     }
>>> +     peer = kmalloc(sizeof(*peer), GFP_ATOMIC | __GFP_ZERO);
>>
>> Please, move the allocation outside the atomic scope and use GFP_KERNEL.
> 
> I don't think I can do that because homa_peer_find is invoked in
> softirq code, which is atomic, no? It's not disastrous if the
> allocation fails; the worst that happens is that an incoming packet
> must be discarded (it will be retried later).

IMHO a _find() helper that allocates thing has a misleading name.

Usually RX path do only lookups, and state allocation is done in the
control path, that avoid atomic issues

> 
>>> +     if (!peer) {
>>> +             peer = (struct homa_peer *)ERR_PTR(-ENOMEM);
>>> +             goto done;
>>> +     }
>>> +     peer->addr = *addr;
>>> +     dst = homa_peer_get_dst(peer, inet);
>>> +     if (IS_ERR(dst)) {
>>> +             kfree(peer);
>>> +             peer = (struct homa_peer *)PTR_ERR(dst);
>>> +             goto done;
>>> +     }
>>> +     peer->dst = dst;
>>> +     hlist_add_head_rcu(&peer->peertab_links, &peertab->buckets[bucket]);
>>
>> At this point another CPU can lookup 'peer'. Since there are no memory
>> barriers it could observe a NULL peer->dst.
> 
> Oops, good catch. I need to add 'smp_wmb()' just before the
> hlist_add_head_rcu line?

Barriers go in pair, one here and one in the lookup. See the relevant
documentation for the gory details.

[...]
>> Note that freeing the peer at 'runtime' will require additional changes:
>> i.e. likely refcounting will be beeded or the at lookup time, after the
>> rcu unlock the code could hit HaF while accessing the looked-up peer.
> 
> I understand about reference counting, but I couldn't parse the last
> 1.5 lines above. What is HaF?

A lot of typos on my side, sorry.

You likely need to introduce refcounting, or after the RCU unlock (end
of RCU grace period) the peer could be freed causing Use after Free.

>> [...]
>>> +static inline struct dst_entry *homa_get_dst(struct homa_peer *peer,
>>> +                                          struct homa_sock *hsk)
>>> +{
>>> +     if (unlikely(peer->dst->obsolete > 0))
>>
>> you need to additionally call dst->ops->check
> 
> I wasn't aware of dst->ops->check, and I'm a little confused by it
> (usage in the kernel doesn't seem totally consistent):
> * If I call dst->ops->check(), do I also need to check obsolete
> (perhaps only call check if obsolete is true?)?
> * What is the 'cookie' argument to dst->ops->check? Can I just use 0 safely?
> * It looks like dst->ops->check now returns a struct dst_entry
> pointer. What is the meaning of this? ChatGPT suggests that it is a
> replacement dst_entry, if the original is no longer valid. 

Luckily (on unfortunately depending on your PoV), the tool you mentioned
simply does not work (yet?) for kernel code. You could (and should)
review with extreme care basically any output about such topic.

dst_check() is the reference code. You should use that helper.

/P