netdev - Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9304A999-0697-4278-B18F-9922862151FB@fb.com>
Date:   Mon, 9 Jul 2018 18:47:21 +0000
From:   Lawrence Brakmo <brakmo@...com>
To:     Yuchung Cheng <ycheng@...gle.com>,
        Neal Cardwell <ncardwell@...gle.com>
CC:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        Blake Matheny <bmatheny@...com>,
        "Alexei Starovoitov" <ast@...com>,
        Steve Ibanez <sibanez@...nford.edu>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Yousuk Seung <ysseung@...gle.com>
Subject: Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

On 7/9/18, 12:32 PM, "Yuchung Cheng" <ycheng@...gle.com> wrote:

    On Sat, Jul 7, 2018 at 7:07 AM, Neal Cardwell <ncardwell@...gle.com> wrote:
    > On Sat, Jul 7, 2018 at 7:15 AM David Miller <davem@...emloft.net> wrote:
    >>
    >> From: Lawrence Brakmo <brakmo@...com>
    >> Date: Tue, 3 Jul 2018 09:26:13 -0700
    >>
    >> > When have observed high tail latencies when using DCTCP for RPCs as
    >> > compared to using Cubic. For example, in one setup there are 2 hosts
    >> > sending to a 3rd one, with each sender having 3 flows (1 stream,
    >> > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
    >> > table shows the 99% and 99.9% latencies for both Cubic and dctcp:
    >> >
    >> >            Cubic 99%  Cubic 99.9%   dctcp 99%    dctcp 99.9%
    >> > 1MB RPCs    2.6ms       5.5ms         43ms          208ms
    >> > 10KB RPCs    1.1ms       1.3ms         53ms          212ms
    >>  ...
    >> > v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
    >> >     tcp_event_ack_sent. Based on Neal Cardwell <ncardwell@...gle.com>
    >> >     feedback.
    >> >     Modified tcp_ecn_check_ce (and renamed it tcp_ecn_check) instead of modifying
    >> >     tcp_ack_send_check to insure an ACK when cwr is received.
    >> > v3: Handling cwr in tcp_ecn_accept_cwr instead of in tcp_ecn_check.
    >> >
    >> > [PATCH net-next v3 1/2] tcp: notify when a delayed ack is sent
    >> > [PATCH net-next v3 2/2] tcp: ack immediately when a cwr packet
    >>
    >> Neal and co., what are your thoughts right now about this patch series?
    >>
    >> Thank you.
    >
    > IMHO these patches are a definite improvement over what we have now.
    >
    > That said, in chatting with Yuchung before the July 4th break, I think
    > Yuchung and I agreed that we would ideally like to see something like
    > the following:
    >
    > (1) refactor the DCTCP code to check for pending delayed ACKs directly
    > using existing state (inet_csk(sk)->icsk_ack.pending &
    > ICSK_ACK_TIMER), and remove the ca->delayed_ack_reserved DCTCP field
    > and the CA_EVENT_DELAYED_ACK and CA_EVENT_NON_DELAYED_ACK callbacks
    > added for DCTCP (which Larry determined had at least one bug).

I agree that getting rid of the callbacks would be an improvement, but that is more about optimizing the code. This could be done after we fix the current bugs. My concern is that it may be more complicated that we think and the current bug would continue to exist. Yes, I realize that it has been there for a while; but not because no one found it before, but because it was hard to pinpoint. 

    > (2) fix the bug with the DCTCP call to tcp_send_ack(sk) causing
    > delayed ACKs to be incorrectly dropped/forgotten (not yet addressed by
    > this patch series)

Good idea, but as I mentioned earlier, I would rather fix the really bad bugs first and then deal with this one. As far as I can see from my testing of DC-TCP, I have not seen any bad consequences from this bug so far.

    > (3) then with fixes (1) and (2) in place, re-run tests and see if we
    > still need Larry's heuristic (in patch 2) to fire an ACK immediately
    > if a receiver receives a CWR packet (I suspect this is still very
    > useful, but I think Yuchung is reluctant to add this complexity unless
    > we have verified it's still needed after (1) and (2))

I fail to understand how (1) and (2) have anything to do with ACKing immediately when we receive a CWR packet. It has nothing to do with a current delayed ACK, it has to do with the cwnd closing to 1 when an TCP ECE marked packet is received at the end of an RPC and the current TCP delay ACK logic choosing to delay the ACK. The issue happens right after the receiver has sent its reply to the RPC, so at that stage there are no active delayed ACKs (the first patch fixed the issue where DC-TCP thought there was an active delayed ACK). 

    >
    > Our team may be able to help out with some proposed patches for (1) and (2).
    >
    > In any case, I would love to have Yuchung and Eric weigh in (perhaps
    > Monday) before we merge this patch series.
    Thanks Neal. Sorry for not reflecting these timely before I took off
    for July 4 holidays. I was going to post the same comment - Larry: I
    could provide draft patches if that helps.

Yuchung: go ahead and send me the drafts. But as I already mentioned, I would like to fix the bad bug first and then make it pretty.
    
    >
    > Thanks,
    > neal