netdev - Re: [PATCH net-next] net: generalize skb freeing deferral to per-cpu lists

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLF04mxhCS=C0VdeJ5afeK8CDRRjszAWhey+F_Gf6L+6Q@mail.gmail.com>
Date:   Fri, 22 Apr 2022 08:59:31 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] net: generalize skb freeing deferral to per-cpu lists

On Fri, Apr 22, 2022 at 2:02 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> Hi,
>
> Looks great! I have a few questions below mostly to understand better
> how it works...
>

Hi Paolo, thanks for the review :)

> On Thu, 2022-04-21 at 08:39 -0700, Eric Dumazet wrote:
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 84d78df60453955a8eaf05847f6e2145176a727a..2fe311447fae5e860eee95f6e8772926d4915e9f 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1080,6 +1080,7 @@ struct sk_buff {
> >               unsigned int    sender_cpu;
> >       };
> >  #endif
> > +     u16                     alloc_cpu;
>
> I *think* we could in theory fetch the CPU that allocated the skb from
> the napi_id - adding a cpu field to napi_struct and implementing an
> helper to fetch it. Have you considered that option? or the napi lookup
> would be just too expensive?

I have considered that, but a NAPI is not guaranteed to be
owned/serviced from a single cpu.

(In fact, I realized recently about the fact that commit
01770a166165 "tcp: fix race condition when creating child sockets from
syncookies"
has not been backported to stable kernels.

This tcp bug would not happen in normal cases, where all packets from
a particular 4-tuple
are handled by a single cpu.

>
> [...]
>
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 4a77ebda4fb155581a5f761a864446a046987f51..4136d9c0ada6870ea0f7689702bdb5f0bbf29145 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4545,6 +4545,12 @@ static void rps_trigger_softirq(void *data)
> >
> >  #endif /* CONFIG_RPS */
> >
> > +/* Called from hardirq (IPI) context */
> > +static void trigger_rx_softirq(void *data)
>
> Perhaps '__always_unused' ? (But the compiler doesn't complain here)

Sure I will add this.

>
> > @@ -6486,3 +6487,46 @@ void __skb_ext_put(struct skb_ext *ext)
> >  }
> >  EXPORT_SYMBOL(__skb_ext_put);
> >  #endif /* CONFIG_SKB_EXTENSIONS */
> > +
> > +/**
> > + * skb_attempt_defer_free - queue skb for remote freeing
> > + * @skb: buffer
> > + *
> > + * Put @skb in a per-cpu list, using the cpu which
> > + * allocated the skb/pages to reduce false sharing
> > + * and memory zone spinlock contention.
> > + */
> > +void skb_attempt_defer_free(struct sk_buff *skb)
> > +{
> > +     int cpu = skb->alloc_cpu;
> > +     struct softnet_data *sd;
> > +     unsigned long flags;
> > +     bool kick;
> > +
> > +     if (WARN_ON_ONCE(cpu >= nr_cpu_ids) || !cpu_online(cpu)) {
> > +             __kfree_skb(skb);
> > +             return;
> > +     }
>
> I'm wondering if we should skip even when cpu == smp_processor_id()?

Yes, although we would have to use the raw_smp_processor_id() form I guess.

>
> > +
> > +     sd = &per_cpu(softnet_data, cpu);
> > +     /* We do not send an IPI or any signal.
> > +      * Remote cpu will eventually call skb_defer_free_flush()
> > +      */
> > +     spin_lock_irqsave(&sd->skb_defer_list.lock, flags);
> > +     __skb_queue_tail(&sd->skb_defer_list, skb);
> > +
> > +     /* kick every time queue length reaches 128.
> > +      * This should avoid blocking in smp_call_function_single_async().
> > +      * This condition should hardly be bit under normal conditions,
> > +      * unless cpu suddenly stopped to receive NIC interrupts.
> > +      */
> > +     kick = skb_queue_len(&sd->skb_defer_list) == 128;
>
> Out of sheer curiosity why 128? I guess it's should be larger then
> NAPI_POLL_WEIGHT, to cope with with maximum theorethical burst len?

Yes, I needed a value there, but was not sure which precise one.
In my tests I had no IPI ever sent with 128.

>
> Thanks!
>
> Paolo
>