linux-kernel - Re: [PATCH v2] net: af_packet: Use hrtimer to do the retire operation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250816164257.3908597-1-jackzxcui1989@163.com>
Date: Sun, 17 Aug 2025 00:42:57 +0800
From: Xin Zhao <jackzxcui1989@....com>
To: willemdebruijn.kernel@...il.com,
	edumazet@...gle.com,
	ferenc@...es.dev
Cc: davem@...emloft.net,
	kuba@...nel.org,
	pabeni@...hat.com,
	horms@...nel.org,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] net: af_packet: Use hrtimer to do the retire operation

On Sat, 2025-08-16 at 17:33 +0800, Willem wrote:

> > > > -	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
> > > 
> > > Since the hrtimer API takes ktime, and there is no other user for
> > > retire_blk_tov, remove that too and instead have interval_ktime.
> > > 
> > > >  	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
> > 
> > We cannot simply remove the retire_blk_tov field, because in net/packet/diag.c 
> > retire_blk_tov is being used in function pdiag_put_ring. Since there are still
> > people using it, I personally prefer not to remove this variable for now. If
> > you think it is still necessary, I can remove it later and adjust the logic in
> > diag.c accordingly, using ktime_to_ms to convert the ktime_t format value back
> > to the u32 type needed in the pdiag_put_ring function.
> 
> Yes, let's remove the unnecessary extra field.
>  
> > 

OK, I will delete the 'retire_blk_tov' and add field named 'interval_ktime' instead.
And Accordingly, we also need to modify the logic in diag.c. The related changes are
as follows:
index 6ce1dcc28..c8f43e0c1 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -83,7 +83,7 @@ static int pdiag_put_ring(struct packet_ring_buffer *ring, int ver, int nl_type,
        pdr.pdr_frame_nr = ring->frame_max + 1;

        if (ver > TPACKET_V2) {
-               pdr.pdr_retire_tmo = ring->prb_bdqc.retire_blk_tov;
+               pdr.pdr_retire_tmo = ktime_to_ms(ring->prb_bdqc.interval_ktime);
                pdr.pdr_sizeof_priv = ring->prb_bdqc.blk_sizeof_priv;
                pdr.pdr_features = ring->prb_bdqc.feature_req_word;
        } else {


> > 
> > > > +	hrtimer_set_expires(&pkc->retire_blk_timer,
> > > > +			    ktime_add(ktime_get(), ms_to_ktime(pkc->retire_blk_tov)));
> > > 
> > > More common for HRTIMER_RESTART timers is hrtimer_forward_now.
> > > 
> > > >  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> > 
> > As I mentioned in my previous response, we cannot use hrtimer_forward_now here
> > because the function _prb_refresh_rx_retire_blk_timer can be called not only
> > when the retire timer expires, but also when the kernel logic for receiving
> > network packets detects that a network packet has filled up a block and calls
> > prb_open_block to use the next block. This can lead to a WARN_ON being triggered
> > in hrtimer_forward_now when it checks if the timer has already been enqueued
> > (WARN_ON(timer->state & HRTIMER_STATE_ENQUEUED)).
> > I encountered this issue when I initially used hrtimer_forward_now. This is the
> > reason why the existing logic for the regular timer uses mod_timer instead of
> > add_timer, as mod_timer is designed to handle such scenarios. A relevant comment
> > in the mod_timer implementation states:
> >  * Note that if there are multiple unserialized concurrent users of the
> >  * same timer, then mod_timer() is the only safe way to modify the timeout,
> >  * since add_timer() cannot modify an already running timer.
> 
> Please add a comment above the call to hrtimer_set_expires and/or in
> the commit message. As this is non-obvious and someone may easily
> later miss that and update.


I will add a comment in PATCH v3 as below:
 static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
 {
-       mod_timer(&pkc->retire_blk_timer,
-                       jiffies + pkc->tov_in_jiffies);
+       /* We cannot use hrtimer_forward_now here because the function
+        * _prb_refresh_rx_retire_blk_timer can be called not only when
+        * the retire timer expires, but also when the kernel logic for
+        * receiving network packets detects that a network packet has
+        * filled up a block and calls prb_open_block to use the next
+        * block. This can lead to a WARN_ON being triggered in
+        * hrtimer_forward_now when it checks if the timer has already
+        * been enqueued.
+        */
+       hrtimer_set_expires(&pkc->retire_blk_timer,
+                           ktime_add(ktime_get(), pkc->interval_ktime));
        pkc->last_kactive_blk_num = pkc->kactive_blk_num;
 }



> 
> So mod_timer can also work as add_timer.
> 
> But for hrtimer, is it safe for an hrtimer_setup call to run while a
> timer is already queued? And same for hrtimer_start.


In the current patch, the hrtimer_setup and hrtimer_start will be called only when
the af_packet-mmap user call setsockopt, and then the following call chain:
  packet_setsockopt
    packet_set_ring
      init_prb_bdqc
        prb_setup_retire_blk_timer
          hrtimer_setup
          hrtimer_start
So once the socket setup, hrtimer_setup and hrtimer_start will never be called again.
However, I also tested calling hrtimer_setup and hrtimer_start within the hrtimer
expiration callback function, and it seems that it does not affect the normal
operation of the timer (the first line of the hrtimer_start comment states that
* hrtimer_start - (re)start an hrtimer, indicating that it can handle this scenario).
As you previously suggested, the hrtimer expiration callback functions typically use
hrtimer_set_expires or hrtimer_forward_now. In addition, repeatedly calling
hrtimer_setup and hrtimer_start within the hrtimer expiration callback function is
also a waste of CPU resources. Therefore, my current patch does not use hrtimer_start
within the hrtimer expiration callback function. I already change it in PATCH v1 as I
mentioned it in Changes in v1:
...
- Use hrtimer_set_expires instead of hrtimer_start_range_ns when retire timer needs update
  as suggested by Willem de Bruijn. Start the hrtimer in prb_setup_retire_blk_timer;
...
The implementation of _prb_refresh_rx_retire_blk_timer in the current patch is as follows:
static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
	/* We cannot use hrtimer_forward_now here because the function
	 * _prb_refresh_rx_retire_blk_timer can be called not only when
	 * the retire timer expires, but also when the kernel logic for
	 * receiving network packets detects that a network packet has
	 * filled up a block and calls prb_open_block to use the next
	 * block. This can lead to a WARN_ON being triggered in
	 * hrtimer_forward_now when it checks if the timer has already
	 * been enqueued.
	 */
	hrtimer_set_expires(&pkc->retire_blk_timer,
			    ktime_add(ktime_get(), pkc->interval_ktime));
	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
}


Thanks
Xin Zhao