[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3b432e20-cca3-4163-b7ac-139efe6a8427@redhat.com>
Date: Tue, 26 Aug 2025 12:53:59 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: John Ousterhout <ouster@...stanford.edu>, netdev@...r.kernel.org
Cc: edumazet@...gle.com, horms@...nel.org, kuba@...nel.org
Subject: Re: [PATCH net-next v15 08/15] net: homa: create homa_pacer.h and
homa_pacer.c
On 8/18/25 10:55 PM, John Ousterhout wrote:
> +/**
> + * homa_pacer_alloc() - Allocate and initialize a new pacer object, which
> + * will hold pacer-related information for @homa.
> + * @homa: Homa transport that the pacer will be associated with.
> + * Return: A pointer to the new struct pacer, or a negative errno.
> + */
> +struct homa_pacer *homa_pacer_alloc(struct homa *homa)
> +{
> + struct homa_pacer *pacer;
> + int err;
> +
> + pacer = kzalloc(sizeof(*pacer), GFP_KERNEL);
> + if (!pacer)
> + return ERR_PTR(-ENOMEM);
> + pacer->homa = homa;
> + spin_lock_init(&pacer->mutex);
> + pacer->fifo_count = 1000;
> + spin_lock_init(&pacer->throttle_lock);
> + INIT_LIST_HEAD_RCU(&pacer->throttled_rpcs);
> + pacer->fifo_fraction = 50;
> + pacer->max_nic_queue_ns = 5000;
> + pacer->throttle_min_bytes = 1000;
> + init_waitqueue_head(&pacer->wait_queue);
> + pacer->kthread = kthread_run(homa_pacer_main, pacer, "homa_pacer");
> + if (IS_ERR(pacer->kthread)) {
> + err = PTR_ERR(pacer->kthread);
> + pr_err("Homa couldn't create pacer thread: error %d\n", err);
> + goto error;
> + }
> + atomic64_set(&pacer->link_idle_time, homa_clock());
> +
> + homa_pacer_update_sysctl_deps(pacer);
IMHO this does not fit mergeable status:
- the static init (@25Gbs)
- never updated on link changes
- assumes a single link in the whole system
I think it's better to split the pacer part out of this series, or the
above points should be addressed and it would be difficult fitting a
reasonable series size.
Also a single thread for all the reap RPC looks like a potentially high
contended spot.
> +/**
> + * homa_pacer_xmit() - Transmit packets from the throttled list until
> + * either (a) the throttled list is empty or (b) the NIC queue has
> + * reached maximum allowable length. Note: this function may be invoked
> + * from either process context or softirq (BH) level. This function is
> + * invoked from multiple places, not just in the pacer thread. The reason
> + * for this is that (as of 10/2019) Linux's scheduling of the pacer thread
> + * is unpredictable: the thread may block for long periods of time (e.g.,
> + * because it is assigned to the same CPU as a busy interrupt handler).
> + * This can result in poor utilization of the network link. So, this method
> + * gets invoked from other places as well, to increase the likelihood that we
> + * keep the link busy. Those other invocations are not guaranteed to happen,
> + * so the pacer thread provides a backstop.
> + * @pacer: Pacer information for a Homa transport.
> + */
> +void homa_pacer_xmit(struct homa_pacer *pacer)
> +{
> + struct homa_rpc *rpc;
> + s64 queue_cycles;
> +
> + /* Make sure only one instance of this function executes at a time. */
> + if (!spin_trylock_bh(&pacer->mutex))
> + return;
> +
> + while (1) {
> + queue_cycles = atomic64_read(&pacer->link_idle_time) -
> + homa_clock();
> + if (queue_cycles >= pacer->max_nic_queue_cycles)
> + break;
> + if (list_empty(&pacer->throttled_rpcs))
> + break;
> +
> + /* Select an RPC to transmit (either SRPT or FIFO) and
> + * take a reference on it. Must do this while holding the
> + * throttle_lock to prevent the RPC from being reaped. Then
> + * release the throttle lock and lock the RPC (can't acquire
> + * the RPC lock while holding the throttle lock; see "Homa
> + * Locking Strategy" in homa_impl.h).
> + */
> + homa_pacer_throttle_lock(pacer);
> + pacer->fifo_count -= pacer->fifo_fraction;
> + if (pacer->fifo_count <= 0) {
> + struct homa_rpc *cur;
> + u64 oldest = ~0;
> +
> + pacer->fifo_count += 1000;
> + rpc = NULL;
> + list_for_each_entry(cur, &pacer->throttled_rpcs,
> + throttled_links) {
> + if (cur->msgout.init_time < oldest) {
> + rpc = cur;
> + oldest = cur->msgout.init_time;
> + }
> + }
> + } else {
> + rpc = list_first_entry_or_null(&pacer->throttled_rpcs,
> + struct homa_rpc,
> + throttled_links);
> + }
> + if (!rpc) {
> + homa_pacer_throttle_unlock(pacer);
> + break;
> + }
> + homa_rpc_hold(rpc);
It's unclear what ensures that 'rpc' is valid at this point.
> + homa_pacer_throttle_unlock(pacer);
> + homa_rpc_lock(rpc);
> + homa_xmit_data(rpc, true);
> +
> + /* Note: rpc->state could be RPC_DEAD here, but the code
> + * below should work anyway.
> + */
> + if (!*rpc->msgout.next_xmit)
> + /* No more data can be transmitted from this message
> + * (right now), so remove it from the throttled list.
> + */
> + homa_pacer_unmanage_rpc(rpc);
> + homa_rpc_unlock(rpc);
> + homa_rpc_put(rpc);
All the loop is atomic context, you should likely place a cond_resched()
here - releasing and reaquiring the mutex as needed.
> +/**
> + * struct homa_pacer - Contains information that the pacer users to
> + * manage packet output. There is one instance of this object stored
> + * in each struct homa.
> + */
> +struct homa_pacer {
> + /** @homa: Transport that this pacer is associated with. */
> + struct homa *homa;
Should be removed
> +/**
> + * homa_pacer_check() - This method is invoked at various places in Homa to
> + * see if the pacer needs to transmit more packets and, if so, transmit
> + * them. It's needed because the pacer thread may get descheduled by
> + * Linux, result in output stalls.
> + * @pacer: Pacer information for a Homa transport.
> + */
> +static inline void homa_pacer_check(struct homa_pacer *pacer)
> +{
> + if (list_empty(&pacer->throttled_rpcs))
> + return;
> +
> + /* The ">> 1" in the line below gives homa_pacer_main the first chance
> + * to queue new packets; if the NIC queue becomes more than half
> + * empty, then we will help out here.
> + */
> + if ((homa_clock() + (pacer->max_nic_queue_cycles >> 1)) <
> + atomic64_read(&pacer->link_idle_time))
> + return;
> + homa_pacer_xmit(pacer);
> +}
apparently not used in this series.
/P
Powered by blists - more mailing lists