netdev - Re: [PATCH net-next v6 04/12] net: homa: create homa_pool.h and homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmw95dDUxUFNa7UjV3XRd66vQRByAP5T_zra6KWdavr2Pg@mail.gmail.com>
Date: Fri, 24 Jan 2025 15:53:55 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org, 
	kuba@...nel.org
Subject: Re: [PATCH net-next v6 04/12] net: homa: create homa_pool.h and homa_pool.c

On Thu, Jan 23, 2025 at 4:06 AM Paolo Abeni <pabeni@...hat.com> wrote:
...
> > +     pool->descriptors = kmalloc_array(pool->num_bpages,
> > +                                       sizeof(struct homa_bpage),
> > +                                       GFP_ATOMIC);
>
> Possibly wort adding '| __GFP_ZERO' and avoid zeroing some fields later.

I prefer to do all the initialization explicitly (this makes it
totally clear that a zero value is intended, as opposed to accidental
omission of an initializer). If you still think I should use
__GFP_ZERO, let me know and I'll add it.

> > +
> > +     /* Allocate and initialize core-specific data. */
> > +     pool->cores = kmalloc_array(nr_cpu_ids, sizeof(struct homa_pool_core),
> > +                                 GFP_ATOMIC);
>
> Uhm... on large system this could be an order-3 allocation, which in
> turn could fail quite easily under memory pressure, and it looks
> contradictory with WRT the cover letter statement about reducing the
> amount of per socket status.
>
> Why don't you use alloc_percpu_gfp() here?

I have now switched to alloc_percpu_gfp. On the issue of per-socket
memory requirements, Homa doesn't significantly reduce the amount of
memory allocated for any given socket. Its memory savings come about
because a single Homa socket can be used to communicate with any
number of peers simultaneously, whereas TCP requires a separate socket
for each peer-to-peer connection. I have added a bit more to the cover
letter to clarify this.

> > +int homa_pool_get_pages(struct homa_pool *pool, int num_pages, __u32 *pages,
> > +                     int set_owner)
> > +{
> > +     int core_num = raw_smp_processor_id();
>
> Why the 'raw' variant? If this code is pre-emptible it means another
> process could be scheduled on the same core...

My understanding is that raw_smp_processor_id is faster.
homa_pool_get_pages is invoked with a spinlock held, so there is no
risk of a core switch while it is executing. Is there some other
problem I have missed?

> > +
> > +             cur = core->next_candidate;
> > +             core->next_candidate++;
>
> ... here, making this increment racy.

Because this code always runs in atomic mode, I don't believe there is
any danger of racing: no other thread can run on the same core
concurrently.

> > +             if (cur >= limit) {
> > +                     core->next_candidate = 0;
> > +
> > +                     /* Must recompute the limit for each new loop through
> > +                      * the bpage array: we may need to consider a larger
> > +                      * range of pages because of concurrent allocations.
> > +                      */
> > +                     limit = 0;
> > +                     continue;
> > +             }
> > +             bpage = &pool->descriptors[cur];
> > +
> > +             /* Figure out whether this candidate is free (or can be
> > +              * stolen). Do a quick check without locking the page, and
> > +              * if the page looks promising, then lock it and check again
> > +              * (must check again in case someone else snuck in and
> > +              * grabbed the page).
> > +              */
> > +             ref_count = atomic_read(&bpage->refs);
> > +             if (ref_count >= 2 || (ref_count == 1 && (bpage->owner < 0 ||
> > +                             bpage->expiration > now)))
>
> The above conditions could be place in separate helper, making the code
> more easy to follow and avoiding some duplications.

Done; I've created a new function homa_bpage_available.

> > +     /* First allocate any full bpages that are needed. */
> > +     full_pages = rpc->msgin.length >> HOMA_BPAGE_SHIFT;
> > +     if (unlikely(full_pages)) {
> > +             if (homa_pool_get_pages(pool, full_pages, pages, 0) != 0)
>
> full_pages must be less than HOMA_MAX_BPAGES, but I don't see any check
> on incoming message length to be somewhat limited ?!?

Oops, good catch. There was a check in the outbound path, but not in
the inbound path. I have added one now (in homa_message_in_init in
homa_incoming.c).

> > +
> > +     /* We get here if there wasn't enough buffer space for this
> > +      * message; add the RPC to hsk->waiting_for_bufs.
> > +      */
> > +out_of_space:
> > +     homa_sock_lock(pool->hsk, "homa_pool_allocate");
>
> There is some chicken-egg issue, with homa_sock_lock() being defined
> only later in the series, but it looks like the string argument is never
> used.

Right: in normal usage this argument is ignored. It exists because
there are occasionally deadlocks involving socket locks; when that
happens I temporarily add code to homa_sock_lock that uses this
argument to help track them down. I'd prefer to keep it, even though
it isn't normally used, because otherwise when a new deadlock arises
I'd have to modify every call to homa_sock_lock in order to add the
information back in again. I added a few more words to the comment for
homa_sock_lock to make this more clear.


> > +             if (!homa_rpc_try_lock(rpc, "homa_pool_check_waiting")) {
> > +                     /* Can't just spin on the RPC lock because we're
> > +                      * holding the socket lock (see sync.txt). Instead,
>
> Stray reference to sync.txt. It would be nice to have the locking schema
> described somewhere start to finish in this series.

sync.txt will be part of the next revision of this series.

> > +struct homa_bpage {
> > +     union {
> > +             /**
> > +              * @cache_line: Ensures that each homa_bpage object
> > +              * is exactly one cache line long.
> > +              */
> > +             char cache_line[L1_CACHE_BYTES];
> > +             struct {
> > +                     /** @lock: to synchronize shared access. */
> > +                     spinlock_t lock;
> > +
> > +                     /**
> > +                      * @refs: Counts number of distinct uses of this
> > +                      * bpage (1 tick for each message that is using
> > +                      * this page, plus an additional tick if the @owner
> > +                      * field is set).
> > +                      */
> > +                     atomic_t refs;
> > +
> > +                     /**
> > +                      * @owner: kernel core that currently owns this page
> > +                      * (< 0 if none).
> > +                      */
> > +                     int owner;
> > +
> > +                     /**
> > +                      * @expiration: time (in sched_clock() units) after
> > +                      * which it's OK to steal this page from its current
> > +                      * owner (if @refs is 1).
> > +                      */
> > +                     __u64 expiration;
> > +             };
>
> ____cacheline_aligned instead of inserting the struct into an union
> should suffice.

Done (but now that alloc_percpu_gfp is being used I'm not sure this is
needed to ensure alignment?).

-John-