netdev - Re: [iovisor-dev] [PATCH RFC 08/11] net/mlx5e: XDP fast RX drop bpf programs support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJ3xEMgfFAA71UUsCvh=ZfnCBGegq_o6qT+RWw_WE55=LqQK4g@mail.gmail.com>
Date:   Thu, 8 Sep 2016 10:10:28 +0300
From:   Or Gerlitz <gerlitz.or@...il.com>
To:     Saeed Mahameed <saeedm@....mellanox.co.il>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        iovisor-dev <iovisor-dev@...ts.iovisor.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Tom Herbert <tom@...bertland.com>,
        Rana Shahout <ranas@...lanox.com>
Subject: Re: [iovisor-dev] [PATCH RFC 08/11] net/mlx5e: XDP fast RX drop bpf
 programs support

On Thu, Sep 8, 2016 at 12:53 AM, Saeed Mahameed
<saeedm@....mellanox.co.il> wrote:
> On Wed, Sep 7, 2016 at 11:55 PM, Or Gerlitz via iovisor-dev
> <iovisor-dev@...ts.iovisor.org> wrote:
>> On Wed, Sep 7, 2016 at 3:42 PM, Saeed Mahameed <saeedm@...lanox.com> wrote:
>>> From: Rana Shahout <ranas@...lanox.com>
>>>
>>> Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.
>>>
>>> When XDP is on we make sure to change channels RQs type to
>>> MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to
>>> ensure "page per packet".
>>>
>>> On XDP set, we fail if HW LRO is set and request from user to turn it
>>> off.  Since on ConnectX4-LX HW LRO is always on by default, this will be
>>> annoying, but we prefer not to enforce LRO off from XDP set function.
>>>
>>> Full channels reset (close/open) is required only when setting XDP
>>> on/off.
>>>
>>> When XDP set is called just to exchange programs, we will update
>>> each RQ xdp program on the fly and for synchronization with current
>>> data path RX activity of that RQ, we temporally disable that RQ and
>>> ensure RX path is not running, quickly update and re-enable that RQ,
>>> for that we do:
>>>         - rq.state = disabled
>>>         - napi_synnchronize
>>>         - xchg(rq->xdp_prg)
>>>         - rq.state = enabled
>>>         - napi_schedule // Just in case we've missed an IRQ
>>>
>>> Packet rate performance testing was done with pktgen 64B packets and on
>>> TX side and, TC drop action on RX side compared to XDP fast drop.
>>>
>>> CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
>>>
>>> Comparison is done between:
>>>         1. Baseline, Before this patch with TC drop action
>>>         2. This patch with TC drop action
>>>         3. This patch with XDP RX fast drop
>>>
>>> Streams    Baseline(TC drop)    TC drop    XDP fast Drop
>>> --------------------------------------------------------------
>>> 1           5.51Mpps            5.14Mpps     13.5Mpps
>>
>> This (13.5 M PPS) is less than 50% of the result we presented @ the
>> XDP summit which was obtained by Rana. Please see if/how much does
>> this grows if you use more sender threads, but all of them to xmit the
>> same stream/flows, so we're on one ring. That (XDP with single RX ring
>> getting packets from N remote TX rings) would be your canonical
>> base-line for any further numbers.
>>
>
> I used N TX senders sending 48Mpps to a single RX core.
> The single RX core could handle only 13.5Mpps.
>
> The implementation here is different from the one we presented at the
> summit, before, it was with striding RQ, now it is regular linked list
> RQ, (Striding RQ ring can handle 32K 64B packets and regular RQ rings
> handles only 1K)

> In striding RQ we register only 16 HW descriptors for every 32K
> packets. I.e for
> every 32K packets we access the HW only 16 times.  on the other hand,
> regular RQ will access the HW (register descriptors) once per packet,
> i.e we write to HW 1K time for 1K packets. i think this explains the
> difference.

> the catch here is that we can't use striding RQ for XDP, bummer!

yep, sounds like a bum bum bum (we went from >30M PPS to 13.5M PPS).

We used striding RQ for XDP with the prev impl. and I don't see a real
deep reason not to do so also when striding RQ doesn't use compound
pages any more.  I guess there are more details I need to catch up with
here, but the bottom result is not good and we need to re-think.

> As i said, we will have the full and final performance results on V1.
> This is just a RFC with barely quick and dirty testing

Yep, understood. But in parallel, you need to reconsider how to get along
without that bumming down of numbers.

Or.