[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLF04mxhCS=C0VdeJ5afeK8CDRRjszAWhey+F_Gf6L+6Q@mail.gmail.com>
Date: Fri, 22 Apr 2022 08:59:31 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] net: generalize skb freeing deferral to per-cpu lists
On Fri, Apr 22, 2022 at 2:02 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> Hi,
>
> Looks great! I have a few questions below mostly to understand better
> how it works...
>
Hi Paolo, thanks for the review :)
> On Thu, 2022-04-21 at 08:39 -0700, Eric Dumazet wrote:
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 84d78df60453955a8eaf05847f6e2145176a727a..2fe311447fae5e860eee95f6e8772926d4915e9f 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1080,6 +1080,7 @@ struct sk_buff {
> > unsigned int sender_cpu;
> > };
> > #endif
> > + u16 alloc_cpu;
>
> I *think* we could in theory fetch the CPU that allocated the skb from
> the napi_id - adding a cpu field to napi_struct and implementing an
> helper to fetch it. Have you considered that option? or the napi lookup
> would be just too expensive?
I have considered that, but a NAPI is not guaranteed to be
owned/serviced from a single cpu.
(In fact, I realized recently about the fact that commit
01770a166165 "tcp: fix race condition when creating child sockets from
syncookies"
has not been backported to stable kernels.
This tcp bug would not happen in normal cases, where all packets from
a particular 4-tuple
are handled by a single cpu.
>
> [...]
>
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 4a77ebda4fb155581a5f761a864446a046987f51..4136d9c0ada6870ea0f7689702bdb5f0bbf29145 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4545,6 +4545,12 @@ static void rps_trigger_softirq(void *data)
> >
> > #endif /* CONFIG_RPS */
> >
> > +/* Called from hardirq (IPI) context */
> > +static void trigger_rx_softirq(void *data)
>
> Perhaps '__always_unused' ? (But the compiler doesn't complain here)
Sure I will add this.
>
> > @@ -6486,3 +6487,46 @@ void __skb_ext_put(struct skb_ext *ext)
> > }
> > EXPORT_SYMBOL(__skb_ext_put);
> > #endif /* CONFIG_SKB_EXTENSIONS */
> > +
> > +/**
> > + * skb_attempt_defer_free - queue skb for remote freeing
> > + * @skb: buffer
> > + *
> > + * Put @skb in a per-cpu list, using the cpu which
> > + * allocated the skb/pages to reduce false sharing
> > + * and memory zone spinlock contention.
> > + */
> > +void skb_attempt_defer_free(struct sk_buff *skb)
> > +{
> > + int cpu = skb->alloc_cpu;
> > + struct softnet_data *sd;
> > + unsigned long flags;
> > + bool kick;
> > +
> > + if (WARN_ON_ONCE(cpu >= nr_cpu_ids) || !cpu_online(cpu)) {
> > + __kfree_skb(skb);
> > + return;
> > + }
>
> I'm wondering if we should skip even when cpu == smp_processor_id()?
Yes, although we would have to use the raw_smp_processor_id() form I guess.
>
> > +
> > + sd = &per_cpu(softnet_data, cpu);
> > + /* We do not send an IPI or any signal.
> > + * Remote cpu will eventually call skb_defer_free_flush()
> > + */
> > + spin_lock_irqsave(&sd->skb_defer_list.lock, flags);
> > + __skb_queue_tail(&sd->skb_defer_list, skb);
> > +
> > + /* kick every time queue length reaches 128.
> > + * This should avoid blocking in smp_call_function_single_async().
> > + * This condition should hardly be bit under normal conditions,
> > + * unless cpu suddenly stopped to receive NIC interrupts.
> > + */
> > + kick = skb_queue_len(&sd->skb_defer_list) == 128;
>
> Out of sheer curiosity why 128? I guess it's should be larger then
> NAPI_POLL_WEIGHT, to cope with with maximum theorethical burst len?
Yes, I needed a value there, but was not sure which precise one.
In my tests I had no IPI ever sent with 128.
>
> Thanks!
>
> Paolo
>
Powered by blists - more mailing lists