netdev - Re: [PATCH net-next] net: generalize skb freeing deferral to per-cpu lists

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <319497a698ba77244aa935c13dc9b93c893dbbc3.camel@redhat.com>
Date:   Fri, 22 Apr 2022 11:02:27 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     netdev <netdev@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next] net: generalize skb freeing deferral to
 per-cpu lists

Hi,

Looks great! I have a few questions below mostly to understand better
how it works...

On Thu, 2022-04-21 at 08:39 -0700, Eric Dumazet wrote:
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 84d78df60453955a8eaf05847f6e2145176a727a..2fe311447fae5e860eee95f6e8772926d4915e9f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1080,6 +1080,7 @@ struct sk_buff {
>  		unsigned int	sender_cpu;
>  	};
>  #endif
> +	u16			alloc_cpu;

I *think* we could in theory fetch the CPU that allocated the skb from
the napi_id - adding a cpu field to napi_struct and implementing an
helper to fetch it. Have you considered that option? or the napi lookup
would be just too expensive?

[...]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4a77ebda4fb155581a5f761a864446a046987f51..4136d9c0ada6870ea0f7689702bdb5f0bbf29145 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4545,6 +4545,12 @@ static void rps_trigger_softirq(void *data)
>  
>  #endif /* CONFIG_RPS */
>  
> +/* Called from hardirq (IPI) context */
> +static void trigger_rx_softirq(void *data)

Perhaps '__always_unused' ? (But the compiler doesn't complain here)

> @@ -6486,3 +6487,46 @@ void __skb_ext_put(struct skb_ext *ext)
>  }
>  EXPORT_SYMBOL(__skb_ext_put);
>  #endif /* CONFIG_SKB_EXTENSIONS */
> +
> +/**
> + * skb_attempt_defer_free - queue skb for remote freeing
> + * @skb: buffer
> + *
> + * Put @skb in a per-cpu list, using the cpu which
> + * allocated the skb/pages to reduce false sharing
> + * and memory zone spinlock contention.
> + */
> +void skb_attempt_defer_free(struct sk_buff *skb)
> +{
> +	int cpu = skb->alloc_cpu;
> +	struct softnet_data *sd;
> +	unsigned long flags;
> +	bool kick;
> +
> +	if (WARN_ON_ONCE(cpu >= nr_cpu_ids) || !cpu_online(cpu)) {
> +		__kfree_skb(skb);
> +		return;
> +	}

I'm wondering if we should skip even when cpu == smp_processor_id()? 

> +
> +	sd = &per_cpu(softnet_data, cpu);
> +	/* We do not send an IPI or any signal.
> +	 * Remote cpu will eventually call skb_defer_free_flush()
> +	 */
> +	spin_lock_irqsave(&sd->skb_defer_list.lock, flags);
> +	__skb_queue_tail(&sd->skb_defer_list, skb);
> +
> +	/* kick every time queue length reaches 128.
> +	 * This should avoid blocking in smp_call_function_single_async().
> +	 * This condition should hardly be bit under normal conditions,
> +	 * unless cpu suddenly stopped to receive NIC interrupts.
> +	 */
> +	kick = skb_queue_len(&sd->skb_defer_list) == 128;

Out of sheer curiosity why 128? I guess it's should be larger then
NAPI_POLL_WEIGHT, to cope with with maximum theorethical burst len?

Thanks!

Paolo