netdev - Re: [PATCH net-next] tcp: optimise receiver buffer autotuning initialisation for high latency connections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10868573-9303-49FE-BC8E-EDD8544FFB50@amazon.com>
Date:   Tue, 8 Dec 2020 16:30:45 +0000
From:   "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     Neal Cardwell <ncardwell@...gle.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "ycheng@...gle.com" <ycheng@...gle.com>,
        "weiwan@...gle.com" <weiwan@...gle.com>,
        "Strohman, Andy" <astroh@...zon.com>,
        "Herrenschmidt, Benjamin" <benh@...zon.com>
Subject: Re: [PATCH net-next] tcp: optimise receiver buffer autotuning initialisation
 for high latency connections

Feel free to ignore this message  as I sent it before seeing  your newly submitted patch (

Thank you.

Hazem



On 08/12/2020, 16:28, "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com> wrote:

        >Please try again, with a fixed tcp_rmem[1] on receiver, taking into
        >account bigger memory requirement for MTU 9000

        >Rationale : TCP should be ready to receive 10 full frames before
        >autotuning takes place (these 10 MSS are typically in a single GRO
       > packet)

        >At 9000 MTU, one frame typically consumes 12KB (or 16KB on some arches/drivers)

       >TCP uses a 50% factor rule, accounting 18000 bytes of kernel memory per MSS.

        ->

        >echo "4096 180000 15728640" >/proc/sys/net/ipv4/tcp_rmem



    >diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
    >index 9e8a6c1aa0190cc248b3b99b073a4c6e45884cf5..81b5d9375860ae583e08045fb25b089c456c60ab
    >100644
    >--- a/net/ipv4/tcp_input.c
    >+++ b/net/ipv4/tcp_input.c
    >@@ -534,6 +534,7 @@ static void tcp_init_buffer_space(struct sock *sk)
    >
    >        tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp);
    >       tp->snd_cwnd_stamp = tcp_jiffies32;
    >+       tp->rcvq_space.space = min(tp->rcv_ssthresh, tp->rcvq_space.space);
    >}

    Yes this worked and it looks like echo "4096 140000 15728640" >/proc/sys/net/ipv4/tcp_rmem is actually enough to trigger TCP autotuning, if the current default tcp_rmem[1] doesn't work well with 9000 MTU I am curious to know  if there is specific reason behind having 131072 specifically   as  tcp_rmem[1]?I think the number itself has to be divisible by page size (4K) and 16KB given what you said that each Jumbo frame packet may consume up to 16KB.

    if the patch I proposed would be risky for users who have MTU of 1500 because of its higher memory footprint in my opinion we should  get the patch you proposed merged instead of asking the Admins doing the manual work.

    Thank you.

    Hazem

    On 07/12/2020, 17:28, "Eric Dumazet" <edumazet@...gle.com> wrote:

        CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



        On Mon, Dec 7, 2020 at 6:17 PM Mohamed Abuelfotoh, Hazem
        <abuehaze@...zon.com> wrote:
        >
        >     >Thanks for testing this, Eric. Would you be able to share the MTU
        >     >config commands you used, and the tcpdump traces you get? I'm
        >     >surprised that receive buffer autotuning would work for advmss of
        >     >around 6500 or higher.
        >
        > Packet capture before applying the proposed patch
        >
        > https://tcpautotuningpcaps.s3.eu-west-1.amazonaws.com/sender-bbr-bad-unpatched.pcap?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJNMP5ZZ3I4FAQGAQ%2F20201207%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20201207T170123Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=a599a0e0e6632a957e5619007ba5ce4f63c8e8535ea24470b7093fef440a8300
        >
        > Packet capture after applying the proposed patch
        >
        > https://tcpautotuningpcaps.s3.eu-west-1.amazonaws.com/sender-bbr-good-patched.pcap?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJNMP5ZZ3I4FAQGAQ%2F20201207%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20201207T165831Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=f18ec7246107590e8ac35c24322af699e4c2a73d174067c51cf6b0a06bbbca77
        >
        > kernel version & MTU and configuration  from my receiver & sender is attached to this e-mail, please be aware that EC2 is doing MSS clamping so you need to configure MTU as 1500 on the sender side if you don’t have any MSS clamping between sender & receiver.
        >
        > Thank you.
        >
        > Hazem

        Please try again, with a fixed tcp_rmem[1] on receiver, taking into
        account bigger memory requirement for MTU 9000

        Rationale : TCP should be ready to receive 10 full frames before
        autotuning takes place (these 10 MSS are typically in a single GRO
        packet)

        At 9000 MTU, one frame typically consumes 12KB (or 16KB on some arches/drivers)

        TCP uses a 50% factor rule, accounting 18000 bytes of kernel memory per MSS.

        ->

        echo "4096 180000 15728640" >/proc/sys/net/ipv4/tcp_rmem



        >
        >
        > On 07/12/2020, 16:34, "Neal Cardwell" <ncardwell@...gle.com> wrote:
        >
        >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
        >
        >
        >
        >     On Mon, Dec 7, 2020 at 11:23 AM Eric Dumazet <edumazet@...gle.com> wrote:
        >     >
        >     > On Mon, Dec 7, 2020 at 5:09 PM Mohamed Abuelfotoh, Hazem
        >     > <abuehaze@...zon.com> wrote:
        >     > >
        >     > >     >Since I can not reproduce this problem with another NIC on x86, I
        >     > >     >really wonder if this is not an issue with ENA driver on PowerPC
        >     > >     >perhaps ?
        >     > >
        >     > >
        >     > > I am able to reproduce it on x86 based EC2 instances using ENA  or  Xen netfront or Intel ixgbevf driver on the receiver so it's not specific to ENA, we were able to easily reproduce it between 2 VMs running in virtual box on the same physical host considering the environment requirements I mentioned in my first e-mail.
        >     > >
        >     > > What's the RTT between the sender & receiver in your reproduction? Are you using bbr on the sender side?
        >     >
        >     >
        >     > 100ms RTT
        >     >
        >     > Which exact version of linux kernel are you using ?
        >
        >     Thanks for testing this, Eric. Would you be able to share the MTU
        >     config commands you used, and the tcpdump traces you get? I'm
        >     surprised that receive buffer autotuning would work for advmss of
        >     around 6500 or higher.
        >
        >     thanks,
        >     neal
        >
        >
        >
        >
        > Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
        >
        > Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
        >
        >





Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705