lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzJLG_5yPXULm_n8cSdpcL8TZBFEZVpTEt7Dx7NQnUQYcGAJA@mail.gmail.com>
Date:   Sat, 25 Mar 2017 15:30:11 +0300
From:   Saeed Mahameed <saeedm@....mellanox.co.il>
To:     Alexei Starovoitov <ast@...com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "David S. Miller" <davem@...emloft.net>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH net-next 00/12] Mellanox mlx5e XDP performance optimization

On Sat, Mar 25, 2017 at 2:26 AM, Alexei Starovoitov <ast@...com> wrote:
> On 3/24/17 2:52 PM, Saeed Mahameed wrote:
>>
>> Hi Dave,
>>
>> This series provides some preformancee optimizations for mlx5e
>> driver, especially for XDP TX flows.
>>
>> 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine
>> which shows a huge gain for both RX and TX packet rates.
>>
>> 2nd patch removes write combining logic from the driver TX handler
>> and simplifies the TX logic while improving TX CPU utilization.
>>
>> All other patches combined provide some refactoring to the driver TX
>> flows to allow some significant XDP TX improvements.
>>
>> More details and performance numbers per patch can be found in each patch
>> commit message compared to the preceding patch.
>>
>> Overall performance improvemnets
>>   System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>>
>> Test case                   Baseline      Now      improvement
>> ---------------------------------------------------------------
>> TX packets (24 threads)     45Mpps        54Mpps      20%
>> TC stack Drop (1 core)      3.45Mpps      3.6Mpps     5%
>> XDP Drop      (1 core)      14Mpps        16.9Mpps    20%
>> XDP TX        (1 core)      10.4Mpps      13.7Mpps    31%
>
>
> Excellent work!
> All patches look great, so for the series:
> Acked-by: Alexei Starovoitov <ast@...nel.org>
>

Thanks Alexei !

> in patch 12 I noticed that inline_mode is being evaluated.
> I think for xdp queues it's guaranteed to be fixed.
> Can we optimize that path little bit more as well?

Yes, you are right, we do evaluate it in  mlx5e_alloc_xdpsq
+       if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {
+               inline_hdr_sz = MLX5E_XDP_MIN_INLINE;
+               ds_cnt++;
+       }

and check it again in mlx5e_xmit_xdp_frame

+      /* copy the inline part if required */
+      if (sq->min_inline_mode != MLX5_INLINE_MODE_NONE) {

sq->min_inline_mode is fixed in run-time, but it is different across
HW versions.
This condition is needed so we would not copy inline headers and waste
CPU cycles while it is not required from ConnectX-5 and later.
Actually this is a 5% XDP_TX optimization you get when you run over
ConnectX-5 [1].

in ConnectX-4 and 4-LX driver is still required to copy L2 headers
into TX descriptor so the HW can make the loopback decision correctly
(needed in case you want XDP program to switch packets between
different PFs/VFs running on the same box/NIC).

So i don't see anyway to do this without breaking XDP loopback
functionality or removing the connectX-5 optimization.

for my taste this condition is good as is.

[1] https://www.spinics.net/lists/netdev/msg419215.html

> Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ