lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1515524652.131759.11.camel@gmail.com>
Date:   Tue, 09 Jan 2018 11:04:12 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Natale Patriciello <natale.patriciello@...il.com>
Cc:     netdev@...r.kernel.org, carloaugusto.grazia@...more.it
Subject: Re: [RFC PATCH 3/3] tcp: Add tunable parameters for TSQ

On Sat, 2018-01-06 at 20:22 +0100, Natale Patriciello wrote:
> Thank you, Eric and David, for the time spent in reviewing our work.
> Some comments inline:
> 
> On 05/01/18 at 03:53am, Eric Dumazet wrote:
> 
> > I do not want to add yet another condition in fast path.
> > Just put an arbitrary large value in the existing sysctl, no need for
> > extra code.
> 
> Due to the minimum statement at the line 2202, the algorithm will ignore
> the arbitrarily large value and will use ~1 ms of data at the current
> rate or 2 segments instead. Therefore, right now there is not the
> possibility to completely disable TSQ, while there was in the first
> version of it.

We do not want to disable TSQ, or even allow someone to disable it.

limit = max(2 * skb->truesize,
            sk->sk_pacing_rate >> sk->sk_pacing_shift);
limit = min_t(u32, limit,
             sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);

Meaning that if you set sk_pacing_shift to 0, you allow to fill the NIC
with 1 second of TCP data per flow. Which is insanely high.

> 
> > You provide dubious reasons, and no real tests done on various
> > hardwares.
> 
> We did perform some test internally on a 4.13 kernel for an academic
> submission. By varying the parameters, we were able to double the
> throughput reachable by any congestion {avoidance, control} algorithm on
> top of 2.4GHz networks with a channel of 40 MHz, and to reduce latency
> (maybe there is some kind of data waiting that is done at
> driver/firmware/hardware level).

Right, we know that bufferbloat helps to get high throughput on a
single flow in a controlled environment.


>  Then we saw the patch, and we became
> aware of the community interest in the topic and decided to ask for
> feedback on a revised version.

No driver uses yet the infra, yet you want to revise it ?
This makes no sense.

> 
> We will for sure increase the number of test cases (including CPU usage)
> and report as soon as the academic world allows us. We are happily using
> Flent for the testing phase.
> 
> > A linux host can have one 10Gbit NIC and a wifi adapter, they require
> > different tunings.
> 
> This is an excellent example that we did not consider while developing
> the patch. Thanks.
> 
> > This is why we added sk_pacing_shift_update(). If this needs
> > refinement, lets talk.
> 
> We believe that it is fundamental to give the user the runtime control
> of the algorithm, which right now starts with latency-saving default
> values but is tailored for a specific kind of network. As you pointed
> out, these should be tuned per-interface. In the following days, we will
> perform more focused testing, trying to assess if there are cases in
> which a fine-tuning is preferable to a logarithmic one.

fine-tuning looks overkill here, especially if you need an extra divide
or more socket space...

> 
> Meanwhile, what would be the best way to expose sk_pacing_shift to the
> userspace?
> 

sk_pacing_shift is not yet part of the ABI. We do not know if it will
be enough or not.

There is no point exposing it, unless we want to keep it for the years
to come (ABI is in stone) and believe user should get it.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ