lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+c_3C5FFv_M+jOPupyMULprP5=GhV97ZW=3za+ToCejw@mail.gmail.com>
Date: Wed, 5 Feb 2025 17:00:23 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	Kuniyuki Iwashima <kuniyu@...zon.com>, Simon Horman <horms@...nel.org>, eric.dumazet@...il.com
Subject: Re: [PATCH net-next] net: flush_backlog() small changes

On Wed, Feb 5, 2025 at 4:22 PM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> Hi Eric,
>
> On Tue, Feb 4, 2025 at 10:49 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > Add READ_ONCE() around reads of skb->dev->reg_state, because
> > this field can be changed from other threads/cpus.
> >
> > Instead of calling dev_kfree_skb_irq() and kfree_skb()
> > while interrupts are masked and locks held,
> > use a temporary list and use __skb_queue_purge_reason()
> >
> > Use SKB_DROP_REASON_DEV_READY drop reason to better
> > describe why these skbs are dropped.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> >  net/core/dev.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index c0021cbd28fc11e4c4eb6184d98a2505fa674871..cd31e78a7d8a2229e3dc17d08bb638f862148823 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6119,16 +6119,18 @@ EXPORT_SYMBOL(netif_receive_skb_list);
> >  static void flush_backlog(struct work_struct *work)
> >  {
> >         struct sk_buff *skb, *tmp;
> > +       struct sk_buff_head list;
> >         struct softnet_data *sd;
> >
> > +       __skb_queue_head_init(&list);
> >         local_bh_disable();
> >         sd = this_cpu_ptr(&softnet_data);
> >
> >         backlog_lock_irq_disable(sd);
> >         skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
> > -               if (skb->dev->reg_state == NETREG_UNREGISTERING) {
> > +               if (READ_ONCE(skb->dev->reg_state) == NETREG_UNREGISTERING) {
> >                         __skb_unlink(skb, &sd->input_pkt_queue);
> > -                       dev_kfree_skb_irq(skb);
> > +                       __skb_queue_tail(&list, skb);
>
> I wonder why we cannot simply replace the above function with
> 'dev_kfree_skb_irq_reason(skb, SKB_DROP_REASON_DEV_READY);'?

Because this horribly expensive thing pushes packets to another perc-cpu list,
and raises a softirq to perform the freeing later from BH.


>
> >                         rps_input_queue_head_incr(sd);
> >                 }
> >         }
> > @@ -6136,14 +6138,16 @@ static void flush_backlog(struct work_struct *work)
> >
> >         local_lock_nested_bh(&softnet_data.process_queue_bh_lock);
> >         skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
> > -               if (skb->dev->reg_state == NETREG_UNREGISTERING) {
> > +               if (READ_ONCE(skb->dev->reg_state) == NETREG_UNREGISTERING) {
> >                         __skb_unlink(skb, &sd->process_queue);
> > -                       kfree_skb(skb);
> > +                       __skb_queue_tail(&list, skb);
>
> Same here.

Please read the changelog, I think you missed the point.

>
> >                         rps_input_queue_head_incr(sd);
> >                 }
> >         }
> >         local_unlock_nested_bh(&softnet_data.process_queue_bh_lock);
> >         local_bh_enable();
> > +
> > +       __skb_queue_purge_reason(&list, SKB_DROP_REASON_DEV_READY);
>
> I'm also worried that dev_kfree_skb_irq() is not the same as
> kfree_skb_reason() because of the following commit:
> commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3
> Author: Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>
> Date:   Thu Jul 23 11:31:48 2020 -0600
>
>     dev: Defer free of skbs in flush_backlog
>
>     IRQs are disabled when freeing skbs in input queue.
>     Use the IRQ safe variant to free skbs here.
>
>     Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
>     Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>
>     Signed-off-by: David S. Miller <davem@...emloft.net>

The point of this patch is to no longer attempt the kfree_skb() while being
in hard or soft irq blocking sections.

Therefore call efficient kfree_skb() instead of expensive fallbacks
that were designed
for callers in hard irq contexts.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ