linux-kernel - Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire operation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.970c1e7dc19d@gmail.com>
Date: Thu, 21 Aug 2025 10:32:16 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Xin Zhao <jackzxcui1989@....com>, 
 willemdebruijn.kernel@...il.com, 
 edumazet@...gle.com, 
 ferenc@...es.dev
Cc: davem@...emloft.net, 
 kuba@...nel.org, 
 pabeni@...hat.com, 
 horms@...nel.org, 
 netdev@...r.kernel.org, 
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v6] net: af_packet: Use hrtimer to do the retire
 operation

Xin Zhao wrote:
> On Wed, 2025-08-20 at 19:15 +0800, Willem wrote:
> 
> > > -static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
> > > +static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc,
> > > +					     bool start)
> > >  {
> > > -	mod_timer(&pkc->retire_blk_timer,
> > > -			jiffies + pkc->tov_in_jiffies);
> > > +	if (start && !hrtimer_is_queued(&pkc->retire_blk_timer))
> > > +		hrtimer_start(&pkc->retire_blk_timer, pkc->interval_ktime,
> > > +			      HRTIMER_MODE_REL_SOFT);
> > > +	else
> > > +		hrtimer_forward_now(&pkc->retire_blk_timer, pkc->interval_ktime);
> > 
> > Is the hrtimer still queued when prb_retire_rx_blk_timer_expired
> > fires? Based on the existence of hrtimer_forward_now, I assume so. But
> > have not checked yet. If so, hrtimer_is_queued alone suffices to
> > detect the other callstack from tpacket_rcv where hrtimer_start is
> > needed. No need for bool start?
> > 
> > >  	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
> > >  }
> 
> 
> Since the CI tests have reported the previously mentioned WARN_ON situation within
> hrtimer_forward_now, I believe we should reevaluate the implementation of the
> _prb_refresh_rx_retire_blk_timer function, which is responsible for setting the
> hrtimer timeout, and establish the principles it should adhere to. After careful
> consideration and a detailed review of the hrtimer implementation, I have identified
> the following two principles:
> 
> 1. It is essential to ensure that calls to hrtimer_forward_now or hrtimer_set_expires
> occur strictly within the hrtimer's callback.
> 2. It is critical to ensure that no calls to hrtimer_forward_now or hrtimer_set_expires
> are made while the hrtimer is enqueued.
> 
> 
> Regarding these two principles, I would like to add three points:
> 1. In principle 1, if hrtimer_forward_now or hrtimer_set_expires is called outside of
> the hrtimer's callback without triggering the timer's enqueue, it will lead to the
> hrtimer timeout not being triggered as expected (this issue is obvious and can be
> reproduced by writing a kernel object and setting a short timeout, such as 2ms).
> 2. Both principles above are aimed at preventing scenarios where hrtimer_forward_now
> or hrtimer_set_expires modify the timeout while the hrtimer is already enqueued, which
> could lead to disarray in the hrtimer's red-black tree (after all, the WARN_ON check
> for enqueue in the non-inlined hrtimer_forward_now interface exists to prevent such
> situations). It is also important to note that apart from executing the hrtimer_start
> series of functions outside the hrtimer callback, the __run_hrtimer function, upon
> returning HRTIMER_RESTART after executing the hrtimer callback, will also enqueue the
> hrtimer.
> 3. The reason for principle 2, in addition to principle 1, is that when setting the
> timeout using hrtimer_forward_now in the hrtimer's callback, there is no protection
> provided by the lock for hrtimer_cpu_base, meaning that if an hrtimer_start action is
> performed outside the hrtimer's callback while simultaneously updating the timeout
> within the callback, it could cause disarray in the hrtimer's red-black tree.
> 
> The occurrence of the WARN_ON enqueue error in the CI test indicates that
> hrtimer_forward_now was executed in a scenario outside the hrtimer's callback while
> the hrtimer was in a queued state. A possible sequence that could cause this issue is
> as follows:
> cpu0 (softirq context, hrtimer timeout)                             cpu1 (process context, need prb_open_block)
> hrtimer_run_softirq
>   __hrtimer_run_queues
>     __run_hrtimer
>       _prb_refresh_rx_retire_blk_timer
>         spin_lock(&po->sk.sk_receive_queue.lock);
>         hrtimer_is_queued(&pkc->retire_blk_timer) == false
>         hrtimer_forward_now
>         spin_unlock(&po->sk.sk_receive_queue.lock)                 tpacket_rcv
>       enqueue_hrtimer                                                spin_lock(&sk->sk_receive_queue.lock);
>                                                                      packet_current_rx_frame
>                                                                        __packet_lookup_frame_in_block
>                                                                          prb_open_block
>                                                                            _prb_refresh_rx_retire_blk_timer
>                                                                              hrtimer_is_queued(&pkc->retire_blk_timer) == true
>                                                                              hrtimer_forward_now
>                                                                              WARN_ON
> 
> In summary, the key issue now is to find a mechanism to ensure that the hrtimer's start
> or enqueue, as well as the timeout settings for the hrtimer, cannot be executed
> concurrently. I have thought of two methods to address this issue (method 1 will make the
> code appear much simpler, while method 2 will make the code more complex):
> 
> Method 1:
> Do not call hrtimer_forward_now to set the timeout within the hrtimer's callback; instead,
> only call the hrtimer_start function to perform the hrtimer's enqueue. This approach is
> viable because, in the current version, inside __run_hrtimer, the state of the timer is
> checked under the protection of cpu_base->lock. If the timer is already enqueued, it will
> not be enqueued again. By doing this, all hrtimer enqueues will be protected under both
> sk_receive_queue.lock and cpu_base->lock, eliminating concerns about the timeout being
> concurrently modified during enqueueing, which could lead to chaos in the hrtimer's
> red-black tree.
> 
> Method 2:
> This method primarily aims to strictly ensure that hrtimer_start is not called within the
> hrtimer's callback. However, doing so would require a lot of additional logic:
> We would need to add a callback parameter to strictly ensure that hrtimer_forward_now is
> executed within the callback and hrtimer_start is executed outside the callback. The
> occurrence of the WARN_ON in the CI test indicates that the method of using
> "hrtimer_is_queued to make the judgment" does not cover all scenarios. The fundamental
> reason for this is that the hrtimer_is_queued check must be precise, which requires
> protection from cpu_base->lock. Similarly, using hrtimer_callback_running check would not
> achieve precise judgment either. It is necessary to know on which CPU the hrtimer is running
> and send an IPI to execute hrtimer_forward_now using local_irq_save on that CPU to satisfy
> the aforementioned principle 2), as it is inappropriate to acquire the cpu_base->lock within
> the af_packet logic; the only way to ensure that the hrtimer_forward_now operation is
> executed without enqueueing the hrtimer is by disabling IRQs.
> 
> Since strictly ensuring that hrtimer_start is not called within the hrtimer's callback leads
> to a lot of extra logic, and logically, I have also demonstrated that it is permissible to
> call hrtimer_start within the hrtimer's callback (for the hrtimer module, the lock is
> cpu_base->lock, which is associated with the clock base where the hrtimer resides, and does
> not follow smp_processor_id()), it does not matter whether hrtimer_start is executed by the
> CPU executing the hrtimer callback or by another CPU; both scenarios are the same for the
> hrtimer module. TTherefore, I prefer to use the aforementioned method 1) to resolve this
> issue. If there are no concerns, I will reflect this in PATCH v7.

Thanks for the analysis.

Using hrtimer_start from within the callback that returns
HRTIMER_RESTART does not sound in line with the intention of the API
to me.

I think we should just adjust and restart from within the callback and
hrtimer_start from tpacket_rcv iff the timer is not yet queued.

Since all these modifications are made while the receive queue lock is
held I don't immediately see why we would need additional mutual
exclusion beyond that.