lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <70dec481-1573-b63d-fd61-2e018535d0fa@bobbriscoe.net>
Date:   Sun, 31 Oct 2021 21:50:56 +0000
From:   Bob Briscoe <research@...briscoe.net>
To:     Eric Dumazet <edumazet@...gle.com>,
        Neal Cardwell <ncardwell@...gle.com>
Cc:     Asad Sajjad Ahmed <asadsa@....uio.no>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Ingemar Johansson S <ingemar.s.johansson@...csson.com>,
        Tom Henderson <tomh@...h.org>,
        Olga Albisser <olga@...isser.org>
Subject: Re: [PATCH net-next] fq_codel: avoid under-utilization with
 ce_threshold at low link rates

Eric,

On 29/10/2021 15:53, Eric Dumazet wrote:
> On Fri, Oct 29, 2021 at 6:54 AM Neal Cardwell <ncardwell@...gle.com> wrote:
>> On Thu, Oct 28, 2021 at 3:15 PM Asad Sajjad Ahmed <asadsa@....uio.no> wrote:
>>> Commit "fq_codel: generalise ce_threshold marking for subset of traffic"
>>> [1] enables ce_threshold to be used in the Internet, not just in data
>>> centres.
>>>
>>> Because ce_threshold is in time units, it can cause poor utilization at
>>> low link rates when it represents <1 packet.
>>> E.g., if link rate <12Mb/s ce_threshold=1ms is <1500B packet.
>>>
>>> So, suppress ECN marking unless the backlog is also > 1 MTU.
>>>
>>> A similar patch to [1] was tested on an earlier kernel, and a similar
>>> one-packet check prevented poor utilization at low link rates [2].
>>>
>>> [1] commit dfcb63ce1de6 ("fq_codel: generalise ce_threshold marking for subset of traffic")
>>>
>>> [2] See right hand column of plots at the end of:
>>> https://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf
>>>
>>> Signed-off-by: Asad Sajjad Ahmed <asadsa@....uio.no>
>>> Signed-off-by: Olga Albisser <olga@...isser.org>
>>> ---
>>>   include/net/codel_impl.h | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/codel_impl.h b/include/net/codel_impl.h
>>> index 137d40d8cbeb..4e3e8473e776 100644
>>> --- a/include/net/codel_impl.h
>>> +++ b/include/net/codel_impl.h
>>> @@ -248,7 +248,8 @@ static struct sk_buff *codel_dequeue(void *ctx,
>>>                                                      vars->rec_inv_sqrt);
>>>          }
>>>   end:
>>> -       if (skb && codel_time_after(vars->ldelay, params->ce_threshold)) {
>>> +       if (skb && codel_time_after(vars->ldelay, params->ce_threshold) &&
>>> +           *backlog > params->mtu) {
> I think this idea would apply to codel quite well.  (This helper is
> common to codel and fq_codel)
>
> But with fq_codel my thoughts are:
>
> *backlog is the backlog of the qdisc, not the backlog for the flow,

[BB] Ah. Hadn't appreciated that. Thanks.

We were modelling this on a check of (*backlog <= params->mtu) in 
codel_should_drop(), which I thought was similarly checking for low link 
rate in fq_codel and codel. But didn't do our homework properly...

> and it includes the packet currently being removed from the queue.
>
> Setting ce_threshold to 1ms while the link rate is 12Mbs sounds
> misconfiguration to me.

[BB] The idea was meant to be that you don't have to know the drain rate 
of the queue. This additional check was meant to suppress marking
     if (ldelay > ce_threshold) && !(qlen > 1).

ce_threshold = 1ms was only an example, nonetheless we had tested that 
config down to 4Mb/s with two flows. We AND'd ce_threshold with a check 
for qlen >1 packet and we still got >95% link utilization, as shown in 
the plots referenced via [2], e.g. Fig 4. But checking qlen would have 
disrupted the code somewhat, so we had hoped to be able to add a check 
of the /queue/'s backlog instead, thinking it was conveniently already 
available.

[2] See right hand column of plots at the end of:
https://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf

We were conscious that this could have suppressed marking of a queue of 
more than one small packet, as long as ldelay also exceeded 
ce_threshold, but we figured that would do no great harm.

> Even if this flow has to transmit one tiny packet every minute, it
> will get CE mark
> just because at least one packet from an elephant flow is currently
> being sent to the wire.
>
> BQL won't prevent that at least one packet is being processed while
> the tiny packet
> is coming into fq_codel qdisc.

[BB] Yes, now we understand it's the backlog of the whole qdisc, this 
isn't the behaviour we intended.

> vars->ldelay = now - skb_time_func(skb);
>
> For tight ce_threshold, vars->ldelay would need to be replaced by
>
> now - (time of first codel_dequeue() after this skb has been queued).
> This seems a bit hard to implement cheaply.

[BB] We'll think whether we can do the qlen check less disruptively.


Bob

>
>
>
>
>>>                  bool set_ce = true;
>>>
>>>                  if (params->ce_threshold_mask) {
>>> --
>> Sounds like a good idea, and looks good to me.
>>
>> Acked-by: Neal Cardwell <ncardwell@...gle.com>
>>
>> Eric, what do you think?
>>
>> neal

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ