lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALx6S35B4gp80AWPBvUv2hEGAMc8Hs2ErjAPOB8SU=SgYoUp=Q@mail.gmail.com>
Date:   Thu, 8 Sep 2016 09:26:03 -0700
From:   Tom Herbert <tom@...bertland.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     John Fastabend <john.fastabend@...il.com>,
        Saeed Mahameed <saeedm@....mellanox.co.il>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        iovisor-dev <iovisor-dev@...ts.iovisor.org>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Brenden Blanco <bblanco@...mgrid.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Martin KaFai Lau <kafai@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jamal Hadi Salim <jhs@...atatu.com>
Subject: Re: [PATCH RFC 11/11] net/mlx5e: XDP TX xmit more

On Wed, Sep 7, 2016 at 10:11 PM, Jesper Dangaard Brouer
<brouer@...hat.com> wrote:
>
> On Wed, 7 Sep 2016 20:21:24 -0700 Tom Herbert <tom@...bertland.com> wrote:
>
>> On Wed, Sep 7, 2016 at 7:58 PM, John Fastabend <john.fastabend@...il.com> wrote:
>> > On 16-09-07 11:22 AM, Jesper Dangaard Brouer wrote:
>> >>
>> >> On Wed, 7 Sep 2016 19:57:19 +0300 Saeed Mahameed <saeedm@....mellanox.co.il> wrote:
>> >>> On Wed, Sep 7, 2016 at 6:32 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> >>>> On Wed, 2016-09-07 at 18:08 +0300, Saeed Mahameed wrote:
>> >>>>> On Wed, Sep 7, 2016 at 5:41 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> >>>>>> On Wed, 2016-09-07 at 15:42 +0300, Saeed Mahameed wrote:
>> >> [...]
>> >>>>
>> >>>> Only if a qdisc is present and pressure is high enough.
>> >>>>
>> >>>> But in a forwarding setup, we likely receive at a lower rate than the
>> >>>> NIC can transmit.
>> >>
>> >> Yes, I can confirm this happens in my experiments.
>> >>
>> >>>>
>> >>>
>> >>> Jesper has a similar Idea to make the qdisc think it is under
>> >>> pressure, when the device TX ring is idle most of the time, i think
>> >>> his idea can come in handy here. I am not fully involved in the
>> >>> details, maybe he can elaborate more.
>> >>>
>> >>> But if it works, it will be transparent to napi, and xmit more will
>> >>> happen by design.
>> >>
>> >> Yes. I have some ideas around getting more bulking going from the qdisc
>> >> layer, by having the drivers provide some feedback to the qdisc layer
>> >> indicating xmit_more should be possible.  This will be a topic at the
>> >> Network Performance Workshop[1] at NetDev 1.2, I have will hopefully
>> >> challenge people to come up with a good solution ;-)
>> >>
>> >
>> > One thing I've noticed but haven't yet actually analyzed much is if
>> > I shrink the nic descriptor ring size to only be slightly larger than
>> > the qdisc layer bulking size I get more bulking and better perf numbers.
>> > At least on microbenchmarks. The reason being the nic pushes back more
>> > on the qdisc. So maybe a case for making the ring size in the NIC some
>> > factor of the expected number of queues feeding the descriptor ring.
>> >
>
> I've also played with shrink the NIC descriptor ring size, it works,
> but it is an ugly hack to get NIC pushes backs, and I foresee it will
> hurt normal use-cases. (There are other reasons for shrinking the ring
> size like cache usage, but that is unrelated to this).
>
>
>> BQL is not helping with that?
>
> Exactly. But the BQL _byte_ limit is not what is needed, what we need
> to know is the _packets_ currently "in-flight".  Which Tom already have
> a patch for :-)  Once we have that the algorithm is simple.
>
> Qdisc dequeue look at BQL pkts-in-flight, if driver have "enough"
> packets in-flight, the qdisc start it's bulk dequeue building phase,
> before calling the driver. The allowed max qdisc bulk size should
> likely be related to pkts-in-flight.
>
Sorry, I'm still missing it. The point of BQL is that we minimize the
amount of data (and hence number of packets) that needs to be queued
in the device in order to prevent the link from going idle while there
are outstanding packets to be sent. The algorithm is based on counting
bytes not packets because bytes are roughly an equal cost unit of
work. So if we've queued 100K of bytes on the queue we know how long
that takes around 80 usecs @10G, but if we count packets then we
really don't know much about that. 100 packets enqueued could
represent 6400 bytes or 6400K worth of data so time to transmit is
anywhere from 5usecs to 5msecs....

Shouldn't qdisc bulk size be based on the BQL limit? What is the
simple algorithm to apply to in-flight packets?

Tom

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ