lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3251fc35-ef83-26fd-4b71-7d5d50945096@gmail.com>
Date:   Thu, 3 Jun 2021 16:37:11 +0300
From:   Leonard Crestez <cdleonard@...il.com>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     Matt Mathis <mattmathis@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Willem de Bruijn <willemb@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        John Heffner <johnwheffner@...il.com>,
        Leonard Crestez <lcrestez@...venets.com>,
        Soheil Hassas Yeganeh <soheil@...gle.com>,
        Roopa Prabhu <roopa@...ulusnetworks.com>,
        Netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFCv2 1/3] tcp: Use smaller mtu probes if RACK is enabled



On 5/26/21 3:11 PM, Neal Cardwell wrote:
> On Wed, May 26, 2021 at 6:38 AM Leonard Crestez <cdleonard@...il.com> wrote:
>>
>> RACK allows detecting a loss in rtt + min_rtt / 4 based on just one
>> extra packet. If enabled use this instead of relying of fast retransmit.
> 
> IMHO it would be worth adding some more text to motivate the change,
> to justify the added complexity and risk from the change. The
> substance of the change seems to be decreasing the requirement for
> PMTU probing from needing roughly 5 packets worth of data to needing
> roughly 3 packets worth of data. It's not clear to me as a reader of
> this patch by itself that there are lots of applications that very
> often only have 3-4 packets worth of data to send and yet can benefit
> greatly from PMTU discovery.
> 
>> Suggested-by: Neal Cardwell <ncardwell@...gle.com>
>> Signed-off-by: Leonard Crestez <cdleonard@...il.com>
>> ---
>>   Documentation/networking/ip-sysctl.rst |  5 +++++
>>   include/net/netns/ipv4.h               |  1 +
>>   net/ipv4/sysctl_net_ipv4.c             |  7 +++++++
>>   net/ipv4/tcp_ipv4.c                    |  1 +
>>   net/ipv4/tcp_output.c                  | 26 +++++++++++++++++++++++++-
>>   5 files changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
>> index a5c250044500..7ab52a105a5d 100644
>> --- a/Documentation/networking/ip-sysctl.rst
>> +++ b/Documentation/networking/ip-sysctl.rst
>> @@ -349,10 +349,15 @@ tcp_mtu_probe_floor - INTEGER
>>          If MTU probing is enabled this caps the minimum MSS used for search_low
>>          for the connection.
>>
>>          Default : 48
>>
>> +tcp_mtu_probe_rack - BOOLEAN
>> +       Try to use shorter probes if RACK is also enabled
>> +
>> +       Default: 1
> 
> I  would vote to not have a sysctl for this. If we think it's a good
> idea to allow MTU probing with a smaller amount of data if RACK is
> enabled (which seems true to me), then this is a low-risk enough
> change that we should just change the behavior.
> 
>>   tcp_min_snd_mss - INTEGER
>>          TCP SYN and SYNACK messages usually advertise an ADVMSS option,
>>          as described in RFC 1122 and RFC 6691.
>>
>>          If this ADVMSS option is smaller than tcp_min_snd_mss,
>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
>> index 746c80cd4257..b4ff12f25a7f 100644
>> --- a/include/net/netns/ipv4.h
>> +++ b/include/net/netns/ipv4.h
>> @@ -112,10 +112,11 @@ struct netns_ipv4 {
>>   #ifdef CONFIG_NET_L3_MASTER_DEV
>>          u8 sysctl_tcp_l3mdev_accept;
>>   #endif
>>          u8 sysctl_tcp_mtu_probing;
>>          int sysctl_tcp_mtu_probe_floor;
>> +       int sysctl_tcp_mtu_probe_rack;
>>          int sysctl_tcp_base_mss;
>>          int sysctl_tcp_min_snd_mss;
>>          int sysctl_tcp_probe_threshold;
>>          u32 sysctl_tcp_probe_interval;
>>
>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
>> index 4fa77f182dcb..275c91fb9cf8 100644
>> --- a/net/ipv4/sysctl_net_ipv4.c
>> +++ b/net/ipv4/sysctl_net_ipv4.c
>> @@ -847,10 +847,17 @@ static struct ctl_table ipv4_net_table[] = {
>>                  .mode           = 0644,
>>                  .proc_handler   = proc_dointvec_minmax,
>>                  .extra1         = &tcp_min_snd_mss_min,
>>                  .extra2         = &tcp_min_snd_mss_max,
>>          },
>> +       {
>> +               .procname       = "tcp_mtu_probe_rack",
>> +               .data           = &init_net.ipv4.sysctl_tcp_mtu_probe_rack,
>> +               .maxlen         = sizeof(int),
>> +               .mode           = 0644,
>> +               .proc_handler   = proc_dointvec,
>> +       },
>>          {
>>                  .procname       = "tcp_probe_threshold",
>>                  .data           = &init_net.ipv4.sysctl_tcp_probe_threshold,
>>                  .maxlen         = sizeof(int),
>>                  .mode           = 0644,
>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>> index 4f5b68a90be9..ed8af4a7325b 100644
>> --- a/net/ipv4/tcp_ipv4.c
>> +++ b/net/ipv4/tcp_ipv4.c
>> @@ -2892,10 +2892,11 @@ static int __net_init tcp_sk_init(struct net *net)
>>          net->ipv4.sysctl_tcp_base_mss = TCP_BASE_MSS;
>>          net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
>>          net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
>>          net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
>>          net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS;
>> +       net->ipv4.sysctl_tcp_mtu_probe_rack = 1;
>>
>>          net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
>>          net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
>>          net->ipv4.sysctl_tcp_keepalive_intvl = TCP_KEEPALIVE_INTVL;
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index bde781f46b41..9691f435477b 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -2311,10 +2311,19 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len)
>>          }
>>
>>          return true;
>>   }
>>
>> +/* Check if rack is supported for current connection */
>> +static int tcp_mtu_probe_is_rack(const struct sock *sk)
>> +{
>> +       struct net *net = sock_net(sk);
>> +
>> +       return (net->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION &&
>> +                       net->ipv4.sysctl_tcp_mtu_probe_rack);
>> +}
> 
> You may want to use the existing helper, tcp_is_rack(), by moving it
> to include/net/tcp.h

OK, for this and other comments.

Initially I though that maybe a more elaborate check is required but it 
seems to be only up to the sender to keep individual timeouts.

--
Regards,
Leonard

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ