netdev - Re: [PATCH] rps: process the skb directly if rps cpu not changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJb3k2kqZX9KQ-1tmw1L9Y0Lw4ksPRTeN97znS5Y3SJ4w@mail.gmail.com>
Date:   Tue, 21 Mar 2023 20:03:24 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     yang.yang29@....com.cn
Cc:     davem@...emloft.net, kuba@...nel.org, netdev@...r.kernel.org,
        xu.xin16@....com.cn, jiang.xuexin@....com.cn,
        zhang.yunkai@....com.cn
Subject: Re: [PATCH] rps: process the skb directly if rps cpu not changed

On Tue, Mar 21, 2023 at 5:12 AM <yang.yang29@....com.cn> wrote:
>
> From: xu xin <xu.xin16@....com.cn>
>
> In the RPS procedure of NAPI receiving, regardless of whether the
> rps-calculated CPU of the skb equals to the currently processing CPU, RPS
> will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog,
> which will trigger a new NET_RX softirq.
>
> Actually, it's not necessary to enqueue it to backlog when rps-calculated
> CPU id equals to the current processing CPU, and we can call
> __netif_receive_skb or __netif_receive_skb_list to process the skb directly.
> The benefit is that it can reduce the number of softirqs of NET_RX and reduce
> the processing delay of skb.
>
> The measured result shows the patch brings 50% reduction of NET_RX softirqs.
> The test was done on the QEMU environment with two-core CPU by iperf3.
> taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R;
> taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R;


Current behavior is not an accident, this was a deliberate choice.

RPS was really for non multi queue devices.

Idea was to dequeue all packets and queue them on various cpu queues,
then at the end of napi->poll(), process 'our' packets.

This is how latencies were kept small (not head of line blocking)

Reducing the number of NET_RX softirqs is probably not changing performance.