[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoAMM-eeSdLfnqHrBRiLmiTULi5mJF5AaU5scHP=32s1oQ@mail.gmail.com>
Date: Thu, 4 Sep 2025 10:50:32 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Xin Zhao <jackzxcui1989@....com>
Cc: willemdebruijn.kernel@...il.com, edumazet@...gle.com, ferenc@...es.dev,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the
retire operation
On Thu, Sep 4, 2025 at 12:17 AM Xin Zhao <jackzxcui1989@....com> wrote:
>
> On Tue, Sep 2, 2025 at 23:43 +0800 Jason Xing <kerneljasonxing@...il.com> wrote:
>
> > > p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
> > > prb_init_ft_ops(p1, req_u);
> > > - prb_setup_retire_blk_timer(po);
> > > + hrtimer_setup(&p1->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > > + CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> > > + hrtimer_start(&p1->retire_blk_timer, p1->interval_ktime,
> > > + HRTIMER_MODE_REL_SOFT);
> >
> > You expect to see it start at the setsockopt phase? Even if it's far
> > from the real use of recv at the moment.
> >
> > > prb_open_block(p1, pbd);
> > > }
>
> Before applying this patch, init_prb_bdqc also start the timer by mod_timer:
>
> init_prb_bdqc
> prb_open_block
> _prb_refresh_rx_retire_blk_timer
> mod_timer
>
> So the current timer's start time is almost the same as it was before applying
> the patch.
>
>
> > > @@ -917,7 +873,6 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
> > > pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
> > >
> > > prb_thaw_queue(pkc1);
> > > - _prb_refresh_rx_retire_blk_timer(pkc1);
> >
> > Could you say more on why you remove this here and only reset/update
> > the expiry time in the timer handler? Probably I missed something
> > appearing in the previous long discussion.
> >
> > >
> > > smp_wmb();
> > > }
>
> In the description of [PATCH net-next v10 0/2] net: af_packet: optimize retire operation:
>
> Changes in v7:
> When the callback return, without sk_buff_head lock protection, __run_hrtimer will
> enqueue the timer if return HRTIMER_RESTART. Setting the hrtimer expires while
> enqueuing a timer may cause chaos in the hrtimer red-black tree.
>
> Neither hrtimer_set_expires nor hrtimer_forward_now is allowed when the hrtimer has
> already been enqueued. Therefore, the only place where the hrtimer timeout can be set is
> within the callback, at which point the hrtimer is in a non-enqueued state and can have
> its timeout set.
Can we use hrtimer_is_queued() instead? Please see tcp_pacing_check()
as an example. But considering your following explanation, I think
it's okay now.
>
>
> Changes in v8:
> Simplify the logic related to setting timeout, as suggestd by Willem de Bruijn.
> Currently timer callback just restarts itself unconditionally, so delete the
> 'out:' label, do not forward hrtimer in prb_open_block, call hrtimer_forward_now
> directly and always return HRTIMER_RESTART. The only special case is when
> prb_open_block is called from tpacket_rcv. That would set the timeout further
> into the future than the already queued timer. An earlier timeout is not
> problematic. No need to add complexity to avoid that.
>
> This paragraph explains that if the block's retire timeout is not adjusted within
> the timer callback, it will only result in an earlier-than-expected retire timeout,
> which is not problematic. Therefore, it is unnecessary to increase the logical complexity
> to ensure block retire timeout occurs as expected each time.
Sounds fair.
>
>
> > The whole structure needs a new organization?
> >
> > Before:
> > /* size: 152, cachelines: 3, members: 22 */
> > /* sum members: 144, holes: 2, sum holes: 8 */
> > /* paddings: 1, sum paddings: 4 */
> > /* last cacheline: 24 bytes */
> > After:
> > /* size: 176, cachelines: 3, members: 19 */
> > /* sum members: 163, holes: 4, sum holes: 13 */
> > /* paddings: 1, sum paddings: 4 */
> > /* forced alignments: 1, forced holes: 1, sum forced holes: 6 */
> > /* last cacheline: 48 bytes */
>
> What about the following organization:?
>
> /* kbdq - kernel block descriptor queue */
> struct tpacket_kbdq_core {
> struct pgv *pkbdq;
> unsigned int feature_req_word;
> unsigned int hdrlen;
> unsigned short kactive_blk_num;
> unsigned short blk_sizeof_priv;
> unsigned char reset_pending_on_curr_blk;
>
> char *pkblk_start;
> char *pkblk_end;
> int kblk_size;
> unsigned int max_frame_len;
> unsigned int knum_blocks;
> char *prev;
> char *nxt_offset;
>
> unsigned short version;
>
> uint64_t knxt_seq_num;
> struct sk_buff *skb;
>
> rwlock_t blk_fill_in_prog_lock;
>
> /* timer to retire an outstanding block */
> struct hrtimer retire_blk_timer;
>
> /* Default is set to 8ms */
> #define DEFAULT_PRB_RETIRE_TOV (8)
>
> ktime_t interval_ktime;
> };
Could you share the result after running 'pahole --hex -C
tpacket_kbdq_core vmlinux'?
Thanks,
Jason
Powered by blists - more mailing lists