[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250828063140.2747329-1-jackzxcui1989@163.com>
Date: Thu, 28 Aug 2025 14:31:40 +0800
From: Xin Zhao <jackzxcui1989@....com>
To: willemdebruijn.kernel@...il.com,
edumazet@...gle.com,
ferenc@...es.dev
Cc: davem@...emloft.net,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org,
netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v8] net: af_packet: Use hrtimer to do the retire operation
On Thu, 2025-08-28 at 5:53 +0800, Willem wrote:
> > Changes in v8:
> > - Delete delete_blk_timer field, as suggested by Willem de Bruijn,
> > hrtimer_cancel will check and wait until the timer callback return and ensure
> > enter enter callback again;
> > - Simplify the logic related to setting timeout, as suggestd by Willem de Bruijn.
> > Currently timer callback just restarts itself unconditionally, so delete the
> > 'out:' label, do not forward hrtimer in prb_open_block, call hrtimer_forward_now
> > directly and always return HRTIMER_RESTART. The only special case is when
> > prb_open_block is called from tpacket_rcv. That would set the timeout further
> > into the future than the already queued timer. An earlier timeout is not
> > problematic. No need to add complexity to avoid that.
>
> This simplifies the timer logic tremendously. I like this direction a lot.
Thanks. :)
>
> > static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > @@ -603,9 +592,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > struct tpacket_kbdq_core *pkc;
> >
> > pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> > - timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > - 0);
> > - pkc->retire_blk_timer.expires = jiffies;
> > + hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > + CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> > + hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > + HRTIMER_MODE_REL_SOFT);
>
> Since this is only called from init_prb_bdqc, we can further remove
> this whole function and move the two hrtimer calls to the parent.
Okay, I will move hrtimer_setup and hrtimer_start into init_prb_bdqc in PATCH v9.
I do not move the prb_shutdown_retire_blk_timer into packet_set_ring either, because
in packet_set_ring, there is no existing pointer for tpacket_kbdq_core. If move the
logic of prb_shutdown_retire_blk_timer into packet_set_ring, we need to add the
tpacket_kbdq_core pointer conversion logic in packet_set_ring, I think it is not
necessary.
> > }
> >
> > static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> > @@ -672,11 +662,10 @@ static void init_prb_bdqc(struct packet_sock *po,
> > p1->last_kactive_blk_num = 0;
> > po->stats.stats3.tp_freeze_q_cnt = 0;
> > if (req_u->req3.tp_retire_blk_tov)
> > - p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> > + p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
> > else
> > - p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> > - req_u->req3.tp_block_size);
> > - p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> > + p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> > + req_u->req3.tp_block_size));
> > p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> > rwlock_init(&p1->blk_fill_in_prog_lock);
> >
> > @@ -686,16 +675,6 @@ static void init_prb_bdqc(struct packet_sock *po,
> > prb_open_block(p1, pbd);
> > }
> >
> > -/* Do NOT update the last_blk_num first.
> > - * Assumes sk_buff_head lock is held.
> > - */
> > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > -{
> > - mod_timer(&pkc->retire_blk_timer,
> > - jiffies + pkc->tov_in_jiffies);
> > - pkc->last_kactive_blk_num = pkc->kactive_blk_num;
>
> last_kactive_blk_num is now only updated on prb_open_block. It still
> needs to be updated on each timer callback? To see whether the active
> block did not change since the last callback.
Since prb_open_block is executed once during the initialization, after initialization,
both last_kactive_blk_num and kactive_blk_num have the same value, which is 0. After
initialization, if the value of kactive_blk_num remains unchanged, it is meaningless
to assign the value of kactive_blk_num to last_kactive_blk_num. I searched through
all the places in the code that can modify kactive_blk_num, and found that there is
only one place, which is in prb_close_block. This means that after executing
prb_close_block, we need to update last_kactive_blk_num at the corresponding places
where it should be updated. Since I did not modify this logic under the tpacket_rcv
scenario, I only need to check the logic in the hrtimer callback.
Upon inspection, I did find an issue. When tpacket_rcv calls __packet_lookup_frame_in_block,
it subsequently calls prb_retire_current_block, which in turn calls prb_close_block,
resulting in an update to kactive_blk_num. After executing prb_retire_current_block,
function __packet_lookup_frame_in_block calls prb_dispatch_next_block, it may not
execute prb_open_block. If prb_open_block is not executed, this will lead to an
inconsistency between last_kactive_blk_num and kactive_blk_num. At this point, the
hrtimer callback will check whether pkc->last_kactive_blk_num == pkc->kactive_blk_num,
which will evaluate to false, thus causing the current logic to differ from the original
logic. However, at this time, it is still necessary to update last_kactive_blk_num.
On the other hand, I also carefully checked the original implementation of
prb_retire_rx_blk_timer_expired and found that in the original hrtimer callback,
last_kactive_blk_num is updated in all cases. Therefore, I need to perform this update
before exiting the sk_receive_queue lock.
In addition, in PATCH v9, I will also remove the refresh_timer label and change the only
instance where goto is used, to an if-else implementation, so that the 'refresh_timer:'
label is no longer needed.
The new implementation of prb_retire_rx_blk_timer_expired:
static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
{
struct packet_sock *po =
timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
unsigned int frozen;
struct tpacket_block_desc *pbd;
spin_lock(&po->sk.sk_receive_queue.lock);
frozen = prb_queue_frozen(pkc);
pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
/* We only need to plug the race when the block is partially filled.
* tpacket_rcv:
* lock(); increment BLOCK_NUM_PKTS; unlock()
* copy_bits() is in progress ...
* timer fires on other cpu:
* we can't retire the current block because copy_bits
* is in progress.
*
*/
if (BLOCK_NUM_PKTS(pbd)) {
/* Waiting for skb_copy_bits to finish... */
write_lock(&pkc->blk_fill_in_prog_lock);
write_unlock(&pkc->blk_fill_in_prog_lock);
}
if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
if (!frozen) {
if (BLOCK_NUM_PKTS(pbd)) {
/* Not an empty block. Need retire the block. */
prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
prb_dispatch_next_block(pkc, po);
}
} else {
/* Case 1. Queue was frozen because user-space was
* lagging behind.
*/
if (!prb_curr_blk_in_use(pbd)) {
/* Case 2. queue was frozen,user-space caught up,
* now the link went idle && the timer fired.
* We don't have a block to close.So we open this
* block and restart the timer.
* opening a block thaws the queue,restarts timer
* Thawing/timer-refresh is a side effect.
*/
prb_open_block(pkc, pbd);
}
}
}
pkc->last_kactive_blk_num = pkc->kactive_blk_num;
hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
spin_unlock(&po->sk.sk_receive_queue.lock);
return HRTIMER_RESTART;
}
Thanks
Xin Zhao
Powered by blists - more mailing lists