netdev - Re: BBR and TCP internal pacing causing interrupt storm with pfifo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1988670d-0d87-2b9e-24cb-a9a610ea33fa@gmail.com>
Date:   Tue, 9 Oct 2018 19:22:53 +0200
From:   Gasper Zejn <zelo.zejn@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>, Kevin Yang <yyd@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org
Subject: Re: BBR and TCP internal pacing causing interrupt storm with
 pfifo_fast

On 09. 10. 2018 19:00, Eric Dumazet wrote:
>
> On 10/09/2018 09:38 AM, Gasper Zejn wrote:
>> Hello,
>>
>> I am seeing interrupt storms of over 100k-900k local timer interrupts
>> when changing between network devices or networks with open TCP
>> connections when not using sch_fq (I was using pfifo_fast). Using sch_fq
>> makes the bug with interrupt storm go away.
>>
> That is for what kind of traffic ?
>
> If your TCP flows send 100k-3M packets per second, then yes, the pacing timers
> could be setup in the 100k-900k range.
>
Traffic is nowhere in that range, think of having a few browser tabs of
javascript rich
web pages open, mostly idle, for example slack, gmail or tweetdeck. No
significant
packet rate is needed, just open connections.

>> The interrupts all called tcp_pace_kick (according to perf), which seems
>> to return HRTIMER_NORESTART, but apparently somewhere calls another
>> function, that does restart the timer.
>>
>> The bug is fairly easy to reproduce. Congestion control needs to be BBR,
>> network scheduler was pfifo_fast, and there need to be open TCP
>> connections when changing network in such a way that TCP connections
>> cannot continue to work (eg. different client IP addresses). The more
>> connections the more interrupts. The connection handling code will cause
>> interrupt storm, which eventually sets down as the connections time out.
>> It is a bit annoying as high interrupt rate does not show as load. I
>> successfully reproduced this with 4.18.12, but this has been happening
>> for some time, with previous versions of kernel too.
>>
>>
>> I'd like to thank you for the comment regarding use of sch_fq with BBR
>> above the tcp_needs_internal_pacing function. It has pointed me in the
>> direction to find the workaround.
>>
> Well, BBR has been very clear about sch_fq being the best packet scheduler
>
> net/ipv4/tcp_bbr.c currently says :
>
> /* ...
>  *
>  * NOTE: BBR might be used with the fq qdisc ("man tc-fq") with pacing enabled,
>  * otherwise TCP stack falls back to an internal pacing using one high
>  * resolution timer per TCP socket and may use more resources.
>  */
>
I am not disputing FQ being the best packet packet scheduler, it does
seem however
that some effort has been made to make BBR work without FQ too. Using more
resources in that case is perfectly fine. But going from ~ thousand
interrupts to few
hundred thousand interrupts (and in the process consuming most of the cpu)
seems to indicate that a corner case was somehow hit as this happens the
moment the network is changed and not before.