linux-kernel - Re: [PATCH] rps: process the skb directly if rps cpu not changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230323100406.36858-1-xu.xin16@zte.com.cn>
Date:   Thu, 23 Mar 2023 10:04:06 +0000
From:   xu xin <xu.xin.sc@...il.com>
To:     linyunsheng@...wei.com, kuba@...nel.org
Cc:     davem@...emloft.net, edumazet@...gle.com, jiang.xuexin@....com.cn,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        xu.xin16@....com.cn, yang.yang29@....com.cn,
        zhang.yunkai@....com.cn
Subject: Re: [PATCH] rps: process the skb directly if rps cpu not changed

>On 2023/3/22 15:24, xu xin wrote:
>> [So sorry, I made a mistake in the reply title]
>> 
>> On 2023/3/21 20:12, yang.yang29@....com.cn wrote:
>>>> From: xu xin <xu.xin16@....com.cn>
>>>>
>>>> In the RPS procedure of NAPI receiving, regardless of whether the
>>>> rps-calculated CPU of the skb equals to the currently processing CPU, RPS
>>>> will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog,
>>>> which will trigger a new NET_RX softirq.
>>>
>>> Does bypassing the backlog cause out of order problem for packet handling?
>>> It seems currently the RPS/RFS will ensure order delivery,such as:
>>> https://elixir.bootlin.com/linux/v6.3-rc3/source/net/core/dev.c#L4485
>>>
>>> Also, this is an optimization, it should target the net-next branch:
>>> [PATCH net-next] rps: process the skb directly if rps cpu not changed
>>>
>> 
>> Well, I thought the patch would't break the effort RFS tried to avoid "Out of
>> Order" packets. But thanks for your reminder, I rethink it again, bypassing the
>> backlog from "netif_receive_skb_list" will mislead RFS's judging if all
>> previous packets for the flow have been dequeued, where RFS thought all packets
>> have been dealed with, but actually they are still in skb lists. Fortunately,
>> bypassing the backlog from "netif_receive_skb" for a single skb is okay and won't
>> cause OOO packets because every skb is processed serially by RPS and sent to the
>> protocol stack as soon as possible.
>
>Suppose a lot of skbs have been queued to the backlog waiting to
>processed and passed to the stack when current_cpu is not the same
>as the target cpu,

Well.  I'm afraid that what we mean by current_cpu may be different. The
"current_cpu" in my patch refers to the cpu NAPI poll is running on (Or
the cpu that the skb origins from).

>then current_cpu is changed to be the same as the
>target cpu, with your patch, new skb will be processed and passed to
>the stack immediately, which may bypass the old skb in the backlog.
>
I think Nop, RFS procedure won't let target cpu switch into a new cpu
if there are still old skbs in the backlog of last recorded cpu. So the
target cpu of the new skb will never equal to current_cpu if old skb in the
backlog.
==========================================================================
Let me draw the situation you described: At the time of T1, the app runs
on cpu-0, so there are many packets queueing into the rxqueue-0 by RFS from
CPU-1(suppose NAPI poll processing on cpu-1). Then, suddenly at the time of
T2, the app tranfers to cpu-1, RFS know there are still old skb in rxqueue-0,
so get_rps_cpu will not return a value of cpu-1, but cpu-0 instead.

========================================================
When T1, app runs on cpu-0:
  APP
-----------------------------
|      |        |      |
|cpu-0 |        |cpu-1 |
|stack |        |stack |
|      |        |      |
   ^
  |=|
  |=|             | |
  |=|             | |
(rxqueue-0)      (rxqueue-1,empty)
   ^<--
       <--
         <--
	     <-- packet(poll on cpu1)
			
===========================================================
When T2, app tranfer to cpu-1, target cpu is still on cpu-0:
                  APP
----------------------------
|      |        |      |
|cpu-0 |        |cpu-1 |
|stack |        |stack |
|      |        |      |
   ^
   |
  |=|             | |
  |=|             | |
(rxqueue-0)      (rxqueue-2,empty)
   ^<--
       <--
         <--
            <-- packet(poll on cpu1)

===================================

Thanks for your reply.

>> 
>> If I'm correct, the code as follws can fix this.
>> 
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5666,8 +5666,9 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
>>         if (static_branch_unlikely(&rps_needed)) {
>>                 struct rps_dev_flow voidflow, *rflow = &voidflow;
>>                 int cpu = get_rps_cpu(skb->dev, skb, &rflow);
>> +               int current_cpu = smp_processor_id();
>>  
>> -               if (cpu >= 0) {
>> +               if (cpu >= 0 && cpu != current_cpu) {
>>                         ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>>                         rcu_read_unlock();
>>                         return ret;
>> @@ -5699,11 +5700,15 @@ void netif_receive_skb_list_internal(struct list_head *head)
>>                 list_for_each_entry_safe(skb, next, head, list) {
>>                         struct rps_dev_flow voidflow, *rflow = &voidflow;
>>                         int cpu = get_rps_cpu(skb->dev, skb, &rflow);
>> +                       int current_cpu = smp_processor_id();
>>  
>>                         if (cpu >= 0) {
>>                                 /* Will be handled, remove from list */
>>                                 skb_list_del_init(skb);
>> -                               enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>> +                               if (cpu != current_cpu)
>> +                                       enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
>> +                               else
>> +                                       __netif_receive_skb(skb);
>>                         }
>>                 }
>> 
>> 
>> Thanks.
>> 
>>>>
>>>> Actually, it's not necessary to enqueue it to backlog when rps-calculated
>>>> CPU id equals to the current processing CPU, and we can call
>>>> __netif_receive_skb or __netif_receive_skb_list to process the skb directly.
>>>> The benefit is that it can reduce the number of softirqs of NET_RX and reduce
>>>> the processing delay of skb.
>>>>
>>>> The measured result shows the patch brings 50% reduction of NET_RX softirqs.
>>>> The test was done on the QEMU environment with two-core CPU by iperf3.
>>>> taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R;
>>>> taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R;
>>>>
>>>> Previous RPS:
>>>> 		    	CPU0       CPU1
>>>> NET_RX:         45          0    (before iperf3 testing)
>>>> NET_RX:        1095         241   (after iperf3 testing)
>>>>
>>>> Patched RPS:
>>>>                 CPU0       CPU1
>>>> NET_RX:         28          4    (before iperf3 testing)
>>>> NET_RX:         573         32   (after iperf3 testing)
>>>
>>> Sincerely.
>>> Xu Xin
>> .
>>