lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJb3k2kqZX9KQ-1tmw1L9Y0Lw4ksPRTeN97znS5Y3SJ4w@mail.gmail.com>
Date:   Tue, 21 Mar 2023 20:03:24 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     yang.yang29@....com.cn
Cc:     davem@...emloft.net, kuba@...nel.org, netdev@...r.kernel.org,
        xu.xin16@....com.cn, jiang.xuexin@....com.cn,
        zhang.yunkai@....com.cn
Subject: Re: [PATCH] rps: process the skb directly if rps cpu not changed

On Tue, Mar 21, 2023 at 5:12 AM <yang.yang29@....com.cn> wrote:
>
> From: xu xin <xu.xin16@....com.cn>
>
> In the RPS procedure of NAPI receiving, regardless of whether the
> rps-calculated CPU of the skb equals to the currently processing CPU, RPS
> will always use enqueue_to_backlog to enqueue the skb to per-cpu backlog,
> which will trigger a new NET_RX softirq.
>
> Actually, it's not necessary to enqueue it to backlog when rps-calculated
> CPU id equals to the current processing CPU, and we can call
> __netif_receive_skb or __netif_receive_skb_list to process the skb directly.
> The benefit is that it can reduce the number of softirqs of NET_RX and reduce
> the processing delay of skb.
>
> The measured result shows the patch brings 50% reduction of NET_RX softirqs.
> The test was done on the QEMU environment with two-core CPU by iperf3.
> taskset 01 iperf3 -c 192.168.2.250 -t 3 -u -R;
> taskset 02 iperf3 -c 192.168.2.250 -t 3 -u -R;


Current behavior is not an accident, this was a deliberate choice.

RPS was really for non multi queue devices.

Idea was to dequeue all packets and queue them on various cpu queues,
then at the end of napi->poll(), process 'our' packets.

This is how latencies were kept small (not head of line blocking)

Reducing the number of NET_RX softirqs is probably not changing performance.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ