netdev - Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the retire operation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoAMM-eeSdLfnqHrBRiLmiTULi5mJF5AaU5scHP=32s1oQ@mail.gmail.com>
Date: Thu, 4 Sep 2025 10:50:32 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Xin Zhao <jackzxcui1989@....com>
Cc: willemdebruijn.kernel@...il.com, edumazet@...gle.com, ferenc@...es.dev, 
	davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v10 2/2] net: af_packet: Use hrtimer to do the
 retire operation

On Thu, Sep 4, 2025 at 12:17 AM Xin Zhao <jackzxcui1989@....com> wrote:
>
> On Tue, Sep 2, 2025 at 23:43 +0800 Jason Xing <kerneljasonxing@...il.com> wrote:
>
> > >         p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
> > >         prb_init_ft_ops(p1, req_u);
> > > -       prb_setup_retire_blk_timer(po);
> > > +       hrtimer_setup(&p1->retire_blk_timer, prb_retire_rx_blk_timer_expired,
> > > +                     CLOCK_MONOTONIC, HRTIMER_MODE_REL_SOFT);
> > > +       hrtimer_start(&p1->retire_blk_timer, p1->interval_ktime,
> > > +                     HRTIMER_MODE_REL_SOFT);
> >
> > You expect to see it start at the setsockopt phase? Even if it's far
> > from the real use of recv at the moment.
> >
> > >         prb_open_block(p1, pbd);
> > >  }
>
> Before applying this patch, init_prb_bdqc also start the timer by mod_timer:
>
> init_prb_bdqc
>   prb_open_block
>     _prb_refresh_rx_retire_blk_timer
>       mod_timer
>
> So the current timer's start time is almost the same as it was before applying
> the patch.
>
>
> > > @@ -917,7 +873,6 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1,
> > >         pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
> > >
> > >         prb_thaw_queue(pkc1);
> > > -       _prb_refresh_rx_retire_blk_timer(pkc1);
> >
> > Could you say more on why you remove this here and only reset/update
> > the expiry time in the timer handler? Probably I missed something
> > appearing in the previous long discussion.
> >
> > >
> > >         smp_wmb();
> > >  }
>
> In the description of [PATCH net-next v10 0/2] net: af_packet: optimize retire operation:
>
> Changes in v7:
>   When the callback return, without sk_buff_head lock protection, __run_hrtimer will
>   enqueue the timer if return HRTIMER_RESTART. Setting the hrtimer expires while
>   enqueuing a timer may cause chaos in the hrtimer red-black tree.
>
> Neither hrtimer_set_expires nor hrtimer_forward_now is allowed when the hrtimer has
> already been enqueued. Therefore, the only place where the hrtimer timeout can be set is
> within the callback, at which point the hrtimer is in a non-enqueued state and can have
> its timeout set.

Can we use hrtimer_is_queued() instead? Please see tcp_pacing_check()
as an example. But considering your following explanation, I think
it's okay now.

>
>
> Changes in v8:
>   Simplify the logic related to setting timeout, as suggestd by Willem de Bruijn.
>   Currently timer callback just restarts itself unconditionally, so delete the
>  'out:' label, do not forward hrtimer in prb_open_block, call hrtimer_forward_now
>   directly and always return HRTIMER_RESTART. The only special case is when
>   prb_open_block is called from tpacket_rcv. That would set the timeout further
>   into the future than the already queued timer. An earlier timeout is not
>   problematic. No need to add complexity to avoid that.
>
> This paragraph explains that if the block's retire timeout is not adjusted within
> the timer callback, it will only result in an earlier-than-expected retire timeout,
> which is not problematic. Therefore, it is unnecessary to increase the logical complexity
> to ensure block retire timeout occurs as expected each time.

Sounds fair.

>
>
> > The whole structure needs a new organization?
> >
> > Before:
> >         /* size: 152, cachelines: 3, members: 22 */
> >         /* sum members: 144, holes: 2, sum holes: 8 */
> >         /* paddings: 1, sum paddings: 4 */
> >         /* last cacheline: 24 bytes */
> > After:
> >         /* size: 176, cachelines: 3, members: 19 */
> >         /* sum members: 163, holes: 4, sum holes: 13 */
> >         /* paddings: 1, sum paddings: 4 */
> >         /* forced alignments: 1, forced holes: 1, sum forced holes: 6 */
> >         /* last cacheline: 48 bytes */
>
> What about the following organization:?
>
> /* kbdq - kernel block descriptor queue */
> struct tpacket_kbdq_core {
>         struct pgv      *pkbdq;
>         unsigned int    feature_req_word;
>         unsigned int    hdrlen;
>         unsigned short  kactive_blk_num;
>         unsigned short  blk_sizeof_priv;
>         unsigned char   reset_pending_on_curr_blk;
>
>         char            *pkblk_start;
>         char            *pkblk_end;
>         int             kblk_size;
>         unsigned int    max_frame_len;
>         unsigned int    knum_blocks;
>         char            *prev;
>         char            *nxt_offset;
>
>         unsigned short  version;
>
>         uint64_t        knxt_seq_num;
>         struct sk_buff  *skb;
>
>         rwlock_t        blk_fill_in_prog_lock;
>
>         /* timer to retire an outstanding block */
>         struct hrtimer  retire_blk_timer;
>
>         /* Default is set to 8ms */
> #define DEFAULT_PRB_RETIRE_TOV  (8)
>
>         ktime_t         interval_ktime;
> };

Could you share the result after running 'pahole --hex -C
tpacket_kbdq_core vmlinux'?

Thanks,
Jason