[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDxyfAWOWT9gWC7wvcEy8tNYM7pF8suJhwUpdz+MWdxhw@mail.gmail.com>
Date: Fri, 5 Sep 2025 14:03:41 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Xin Zhao <jackzxcui1989@....com>
Cc: willemdebruijn.kernel@...il.com, edumazet@...gle.com, ferenc@...es.dev,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the
retire operation
On Fri, Sep 5, 2025 at 12:01 PM Xin Zhao <jackzxcui1989@....com> wrote:
>
> On Thu, Sep 4, 2025 at 11:26 +0800 Jason Xing <kerneljasonxing@...il.com> wrote:
>
> > > In the description of [PATCH net-next v10 0/2] net: af_packet: optimize retire operation:
> > >
> > > Changes in v8:
> > > - Delete delete_blk_timer field, as suggested by Willem de Bruijn,
> > > hrtimer_cancel will check and wait until the timer callback return and ensure
> > > enter enter callback again;
> >
> > I see the reason now :)
> >
> > Please know that the history changes through versions will finally be
> > removed, only the official message that will be kept in the git. So
> > this kind of change, I think, should be clarified officially since
> > you're removing a structure member. Adding more descriptions will be
> > helpful to readers in the future. Thank you.
>
> I will add some more information to the commit message of this 2/2 PATCH.
>
>
>
> > > Consider the following timing sequence:
> > > timer cpu0 (softirq context, hrtimer timeout) cpu1 (process context)
> > > 0 hrtimer_run_softirq
> > > 1 __hrtimer_run_queues
> > > 2 __run_hrtimer
> > > 3 prb_retire_rx_blk_timer_expired
> > > 4 spin_lock(&po->sk.sk_receive_queue.lock);
> > > 5 _prb_refresh_rx_retire_blk_timer
> > > 6 hrtimer_forward_now
> > > 7 spin_unlock(&po->sk.sk_receive_queue.lock)
> > > 8 raw_spin_lock_irq(&cpu_base->lock); tpacket_rcv
> > > 9 enqueue_hrtimer spin_lock(&sk->sk_receive_queue.lock);
> > > 10 packet_current_rx_frame
> > > 11 __packet_lookup_frame_in_block
> > > 12 finish enqueue_hrtimer prb_open_block
> > > 13 _prb_refresh_rx_retire_blk_timer
> > > 14 hrtimer_is_queued(&pkc->retire_blk_timer) == true
> > > 15 hrtimer_forward_now
> > > 16 WARN_ON
> > > On cpu0 in the timing sequence above, enqueue_hrtimer is not protected by sk_receive_queue.lock,
> > > while the hrtimer_forward_now is not protected by raw_spin_lock_irq(&cpu_base->lock).
> > >
> > > In my previous email, I provided an explanation. As a supplement, I would
> > > like to reiterate a paragraph from my earlier response to Willem.
> > > The point is that when the hrtimer is in the enqueued state, you cannot
> >
> > How about tring hrtimer_is_queued() beforehand?
> >
> > IIUC, with this patch applied, we will lose the opportunity to refresh
> > the timer when the lookup function (in the above path I mentioned)
> > gets called compared to before. If the packet socket tries to look up
> > a new block and it doesn't update its expiry time, the timer will soon
> > wake up. Does it sound unreasonable?
>
>
> I actually pointed out the issue with the timeout setting in a previous email:
> https://lore.kernel.org/netdev/20250826030328.878001-1-jackzxcui1989@163.com/.
>
> Regarding the method you mentioned, using hrtimer_is_queued to assist in judgment, I had
> discussed this extensively with Willem in previous emails, and the conclusion was that
> it is not feasible. The reason is that in our scenario, the hrtimer always returns
> HRTIMER_RESTART, unlike the places you pointed out, such as tcp_pacing_check, where the
> corresponding hrtimer callbacks all return HRTIMER_NORESTART. Since our scenario returns
> HRTIMER_RESTART, this can lead to many troublesome issues. The fundamental reason is that
> if HRTIMER_RESTART is returned, the hrtimer module will enqueue the hrtimer after the
> callback returns, which leads to exiting the protection of our sk_receive_queue lock.
>
> Returning to the functionality here, if we really want to update the hrtimer's timeout
> outside of the timer callback, there are two key points to note:
>
> 1. Accurately knowing whether the current context is a timer callback or tpacket_rcv.
> 2. How to update the hrtimer's timeout in a non-timer callback scenario.
>
> To start with the first point, it has already been explained in previous emails that
> executing hrtimer_forward outside of a timer callback is not allowed. Therefore, we
> must accurately determine whether we are in a timer callback; only in that context can
> we use the hrtimer_forward function to update.
> In the original code, since the same _prb_refresh_rx_retire_blk_timer function was called,
> distinguishing between contexts required code restructuring. Now that this patch removes
> the _prb_refresh_rx_retire_blk_timer function, achieving this accurate distinction is not
> too difficult.
> The key issue is the second point. If we are not inside the hrtimer's callback, we cannot
> use hrtimer_forward to update the timeout.
> So what other interface can we use? You might
> suggest using hrtimer_start, but fundamentally, hrtimer_start cannot be called if it has
> already been started previously. Therefore, wouldn’t you need to add hrtimer_cancel to
> confirm that the hrtimer has been canceled? Once hrtimer_cancel is added, there will also
> be scenarios where it is restarted, which means we need to consider the concurrent
> scenario when the socket exits and also calls hrtimer_cancel. This might require adding
> logic for that concurrency scenario, and you might even need to reintroduce the
> delete_blk_timer variable to indicate whether the packet_release operation has been
> triggered so that the hrtimer does not restart in the tpacket_rcv scenario.
>
> In fact, in a previous v7 version, I proposed a change that I personally thought was
> quite good, which can be seen here:
> https://lore.kernel.org/netdev/20250822132051.266787-1-jackzxcui1989@163.com/. However,
> this change introduced an additional variable and more logic. Willem also pointed out
> that the added complexity to avoid a non-problematic issue was unnecessary.
Admittedly it's a bit complex.
>
> As mentioned in Changes in v8:
> The only special case is when prb_open_block is called from tpacket_rcv.
> That would set the timeout further into the future than the already queued
> timer. An earlier timeout is not problematic. No need to add complexity to
> avoid that.
It'd be better to highlight this in the commit message as well to
avoid further repeat questions from others. It's an obvious change in
this patch :)
In the meanwhile, you should adjust the comments above
prb_retire_rx_blk_timer_expired() and prb_open_block() because you
removed the refresh logic there.
>
> It is not problematic, as Willem point it out in
> https://lore.kernel.org/netdev/willemdebruijn.kernel.2d7599ee951fd@gmail.com/
>
>
> In the end:
>
> So, if you agree with the current changes in v10 and do not wish to add the timeout
> setting under tpacket_rcv, that’s fine.
> If you do not agree, then the only alternative I can think of is to use a combination
> of hrtimer_cancel and hrtimer_start under prb_open_block, and we would also need to
> reintroduce the delete_blk_timer variable to help determine whether the hrtimer was
> canceled due to the packet_release behavior. If we really want to make this change,
> it does add quite a bit of logic, and we would also need Willem's agreement.
Thanks, I think I know enough about this long background. Let's stick
with the approach you provided in the next respin :)
Thanks,
Jason
Powered by blists - more mailing lists