netdev - Re: [PATCH] rps: process the skb directly if rps cpu not changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aadae1c0-9d50-d89d-d0ea-a300fa09682c@huawei.com>
Date:   Wed, 22 Mar 2023 10:02:31 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     <yang.yang29@....com.cn>, <edumazet@...gle.com>
CC:     <davem@...emloft.net>, <kuba@...nel.org>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <xu.xin16@....com.cn>,
        <jiang.xuexin@....com.cn>, <zhang.yunkai@....com.cn>
Subject: Re: [PATCH] rps: process the skb directly if rps cpu not changed

On 2023/3/21 20:12, yang.yang29@....com.cn wrote:
> From: xu xin <xu.xin16@....com.cn>
> 
> In the RPS procedure of NAPI receiving, regardless of whether the
> rps-calculated CPU of the skb equals to the currently processing CPU, RPS
> will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog,
> which will trigger a new NET_RX softirq.

Does bypassing the backlog cause out of order problem for packet handling?
It seems currently the RPS/RFS will ensure order delivery,such as:
https://elixir.bootlin.com/linux/v6.3-rc3/source/net/core/dev.c#L4485

Also, this is an optimization, it should target the net-next branch:
[PATCH net-next] rps: process the skb directly if rps cpu not changed

> 
> Actually, it's not necessary to enqueue it to backlog when rps-calculated
> CPU id equals to the current processing CPU, and we can call
> __netif_receive_skb or __netif_receive_skb_list to process the skb directly.
> The benefit is that it can reduce the number of softirqs of NET_RX and reduce
> the processing delay of skb.
> 
> The measured result shows the patch brings 50% reduction of NET_RX softirqs.
> The test was done on the QEMU environment with two-core CPU by iperf3.
> taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R;
> taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R;
> 
> Previous RPS:
> 		    	CPU0       CPU1
> NET_RX:         45          0    (before iperf3 testing)
> NET_RX:        1095         241   (after iperf3 testing)
> 
> Patched RPS:
>                 CPU0       CPU1
> NET_RX:         28          4    (before iperf3 testing)
> NET_RX:         573         32   (after iperf3 testing)
> 
> Signed-off-by: xu xin <xu.xin16@....com.cn>
> Reviewed-by: Zhang Yunkai <zhang.yunkai@....com.cn>
> Reviewed-by: Yang Yang <yang.yang29@....com.cn>
> Cc: Xuexin Jiang <jiang.xuexin@....com.cn>
> ---
>  net/core/dev.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index c7853192563d..c33ddac3c012 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
>  	if (static_branch_unlikely(&rps_needed)) {
>  		struct rps_dev_flow voidflow, *rflow = &voidflow;
>  		int cpu = get_rps_cpu(skb->dev, skb, &rflow);
> +		int current_cpu = smp_processor_id();
> 
> -		if (cpu >= 0) {
> +		if (cpu >= 0 && cpu != current_cpu) {
>  			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>  			rcu_read_unlock();
>  			return ret;
> @@ -5699,8 +5700,9 @@ void netif_receive_skb_list_internal(struct list_head *head)
>  		list_for_each_entry_safe(skb, next, head, list) {
>  			struct rps_dev_flow voidflow, *rflow = &voidflow;
>  			int cpu = get_rps_cpu(skb->dev, skb, &rflow);
> +			int current_cpu = smp_processor_id();
> 
> -			if (cpu >= 0) {
> +			if (cpu >= 0 && cpu != current_cpu) {
>  				/* Will be handled, remove from list */
>  				skb_list_del_init(skb);
>  				enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>