lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0b3bb4c5-49bb-11a7-eac8-5515ef72851e@gmail.com>
Date:   Mon, 15 Oct 2018 09:23:03 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>,
        Gasper Zejn <zelo.zejn@...il.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>, Kevin Yang <yyd@...gle.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: BBR and TCP internal pacing causing interrupt storm with
 pfifo_fast



On 10/15/2018 07:50 AM, Eric Dumazet wrote:
> On Mon, Oct 15, 2018 at 3:26 AM Gasper Zejn <zelo.zejn@...il.com> wrote:
>>
>>
>> I've tried to isolate the issue as best I could. There seems to be an
>> issue if the TCP socket has keepalive set and send queue is not empty
>> and the route goes away.
>>
>> https://github.com/zejn/bbr_pfifo_interrupts_issue
>>
>> Hope this helps,
>> Gasper
> 
> This is awesome Gasper, I will take a look thanks.
> 
> Note that we are about to send a patch series (targeting net-next) to
> polish the EDT patch series that was merged last month for linux-4.20.
> TCP internal pacing is going to be much better performance-wise.
> 

Yeah, I believe that :

Commit c092dd5f4a7f4e4dbbcc8cf2e50b516bf07e432f ("tcp: switch
tcp_internal_pacing() to tcp_wstamp_ns")
has incidentally fixed the issue.

That is because it calls tcp_internal_pacing() from
tcp_update_skb_after_send() which is called only if the packet was
correctly sent by IP layer.

Before this patch, tcp_internal_pacing() was called from
__tcp_transmit_skb() before we attempted to send the clone
and the clone could be dropped in IP layer (lack of route for example)
right away.

So in case the packet was not sent because of a route problem, the high resolution
timer would kick soon after and TCP xmit path would be entered again, triggering this loop problem.

I am going to send the 2nd round of EDT patches, so that you can try David Miller net-next tree
with all the patches we believe are needed for 4.20. Once proven to work, we might have to backport
the series to 4.18 and 4.19

Thanks !

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ