[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.33a7282981e76@gmail.com>
Date: Thu, 28 Aug 2025 11:30:06 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Xin Zhao <jackzxcui1989@....com>,
willemdebruijn.kernel@...il.com,
edumazet@...gle.com,
ferenc@...es.dev
Cc: davem@...emloft.net,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org,
netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v8] net: af_packet: Use hrtimer to do the retire
operation
Xin Zhao wrote:
> On Thu, 2025-08-28 at 5:53 +0800, Willem wrote:
>
> > > Changes in v8:
> > > - Delete delete_blk_timer field, as suggested by Willem de Bruijn,
> > > hrtimer_cancel will check and wait until the timer callback return and ensure
> > > enter enter callback again;
> > > - Simplify the logic related to setting timeout, as suggestd by Willem de Bruijn.
> > > Currently timer callback just restarts itself unconditionally, so delete the
> > > 'out:' label, do not forward hrtimer in prb_open_block, call hrtimer_forward_now
> > > directly and always return HRTIMER_RESTART. The only special case is when
> > > prb_open_block is called from tpacket_rcv. That would set the timeout further
> > > into the future than the already queued timer. An earlier timeout is not
> > > problematic. No need to add complexity to avoid that.
> >
> > This simplifies the timer logic tremendously. I like this direction a lot.
>
> Thanks. :)
>
> >
> > > static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > > @@ -603,9 +592,10 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po)
> > > struct tpacket_kbdq_core *pkc;
> > >
> > > pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> > > - timer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > > - 0);
> > > - pkc->retire_blk_timer.expires = jiffies;
> > > + hrtimer_setup(&pkc->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > > + CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> > > + hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > > + HRTIMER_MODE_REL_SOFT);
> >
> > Since this is only called from init_prb_bdqc, we can further remove
> > this whole function and move the two hrtimer calls to the parent.
>
> Okay, I will move hrtimer_setup and hrtimer_start into init_prb_bdqc in PATCH v9.
>
> I do not move the prb_shutdown_retire_blk_timer into packet_set_ring either, because
> in packet_set_ring, there is no existing pointer for tpacket_kbdq_core. If move the
> logic of prb_shutdown_retire_blk_timer into packet_set_ring, we need to add the
> tpacket_kbdq_core pointer conversion logic in packet_set_ring, I think it is not
> necessary.
Agreed.
>
> > > }
> > >
> > > static int prb_calc_retire_blk_tmo(struct packet_sock *po,
> > > @@ -672,11 +662,10 @@ static void init_prb_bdqc(struct packet_sock *po,
> > > p1->last_kactive_blk_num = 0;
> > > po->stats.stats3.tp_freeze_q_cnt = 0;
> > > if (req_u->req3.tp_retire_blk_tov)
> > > - p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
> > > + p1->interval_ktime = ms_to_ktime(req_u->req3.tp_retire_blk_tov);
> > > else
> > > - p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
> > > - req_u->req3.tp_block_size);
> > > - p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> > > + p1->interval_ktime = ms_to_ktime(prb_calc_retire_blk_tmo(po,
> > > + req_u->req3.tp_block_size));
> > > p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> > > rwlock_init(&p1->blk_fill_in_prog_lock);
> > >
> > > @@ -686,16 +675,6 @@ static void init_prb_bdqc(struct packet_sock *po,
> > > prb_open_block(p1, pbd);
> > > }
> > >
> > > -/* Do NOT update the last_blk_num first.
> > > - * Assumes sk_buff_head lock is held.
> > > - */
> > > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > > -{
> > > - mod_timer(&pkc->retire_blk_timer,
> > > - jiffies + pkc->tov_in_jiffies);
> > > - pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> >
> > last_kactive_blk_num is now only updated on prb_open_block. It still
> > needs to be updated on each timer callback? To see whether the active
> > block did not change since the last callback.
>
> Since prb_open_block is executed once during the initialization, after initialization,
> both last_kactive_blk_num and kactive_blk_num have the same value, which is 0. After
> initialization, if the value of kactive_blk_num remains unchanged, it is meaningless
> to assign the value of kactive_blk_num to last_kactive_blk_num. I searched through
> all the places in the code that can modify kactive_blk_num, and found that there is
> only one place, which is in prb_close_block. This means that after executing
> prb_close_block, we need to update last_kactive_blk_num at the corresponding places
> where it should be updated. Since I did not modify this logic under the tpacket_rcv
> scenario, I only need to check the logic in the hrtimer callback.
>
> Upon inspection, I did find an issue. When tpacket_rcv calls __packet_lookup_frame_in_block,
> it subsequently calls prb_retire_current_block, which in turn calls prb_close_block,
> resulting in an update to kactive_blk_num. After executing prb_retire_current_block,
> function __packet_lookup_frame_in_block calls prb_dispatch_next_block, it may not
> execute prb_open_block. If prb_open_block is not executed, this will lead to an
> inconsistency between last_kactive_blk_num and kactive_blk_num. At this point, the
> hrtimer callback will check whether pkc->last_kactive_blk_num == pkc->kactive_blk_num,
> which will evaluate to false, thus causing the current logic to differ from the original
> logic. However, at this time, it is still necessary to update last_kactive_blk_num.
>
> On the other hand, I also carefully checked the original implementation of
> prb_retire_rx_blk_timer_expired and found that in the original hrtimer callback,
> last_kactive_blk_num is updated in all cases. Therefore, I need to perform this update
> before exiting the sk_receive_queue lock.
>
> In addition, in PATCH v9, I will also remove the refresh_timer label and change the only
> instance where goto is used, to an if-else implementation, so that the 'refresh_timer:'
> label is no longer needed.
>
> The new implementation of prb_retire_rx_blk_timer_expired:
>
> static enum hrtimer_restart prb_retire_rx_blk_timer_expired(struct hrtimer *t)
> {
> struct packet_sock *po =
> timer_container_of(po, t, rx_ring.prb_bdqc.retire_blk_timer);
> struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
> unsigned int frozen;
> struct tpacket_block_desc *pbd;
>
> spin_lock(&po->sk.sk_receive_queue.lock);
>
> frozen = prb_queue_frozen(pkc);
> pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
>
> /* We only need to plug the race when the block is partially filled.
> * tpacket_rcv:
> * lock(); increment BLOCK_NUM_PKTS; unlock()
> * copy_bits() is in progress ...
> * timer fires on other cpu:
> * we can't retire the current block because copy_bits
> * is in progress.
> *
> */
> if (BLOCK_NUM_PKTS(pbd)) {
> /* Waiting for skb_copy_bits to finish... */
> write_lock(&pkc->blk_fill_in_prog_lock);
> write_unlock(&pkc->blk_fill_in_prog_lock);
> }
>
> if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
> if (!frozen) {
> if (BLOCK_NUM_PKTS(pbd)) {
> /* Not an empty block. Need retire the block. */
> prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
> prb_dispatch_next_block(pkc, po);
> }
> } else {
> /* Case 1. Queue was frozen because user-space was
> * lagging behind.
> */
> if (!prb_curr_blk_in_use(pbd)) {
> /* Case 2. queue was frozen,user-space caught up,
> * now the link went idle && the timer fired.
> * We don't have a block to close.So we open this
> * block and restart the timer.
> * opening a block thaws the queue,restarts timer
> * Thawing/timer-refresh is a side effect.
> */
> prb_open_block(pkc, pbd);
> }
> }
> }
>
> pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> spin_unlock(&po->sk.sk_receive_queue.lock);
> return HRTIMER_RESTART;
> }
Ack.
Powered by blists - more mailing lists