lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <183b7fe0-0757-cf63-555c-925ea840c67f@gmail.com>
Date:   Tue, 9 Oct 2018 18:38:17 +0200
From:   Gasper Zejn <zelo.zejn@...il.com>
To:     Kevin Yang <yyd@...gle.com>, Eric Dumazet <edumazet@...gle.com>,
        netdev@...r.kernel.org
Subject: BBR and TCP internal pacing causing interrupt storm with pfifo_fast

Hello,

I am seeing interrupt storms of over 100k-900k local timer interrupts
when changing between network devices or networks with open TCP
connections when not using sch_fq (I was using pfifo_fast). Using sch_fq
makes the bug with interrupt storm go away.

The interrupts all called tcp_pace_kick (according to perf), which seems
to return HRTIMER_NORESTART, but apparently somewhere calls another
function, that does restart the timer.

The bug is fairly easy to reproduce. Congestion control needs to be BBR,
network scheduler was pfifo_fast, and there need to be open TCP
connections when changing network in such a way that TCP connections
cannot continue to work (eg. different client IP addresses). The more
connections the more interrupts. The connection handling code will cause
interrupt storm, which eventually sets down as the connections time out.
It is a bit annoying as high interrupt rate does not show as load. I
successfully reproduced this with 4.18.12, but this has been happening
for some time, with previous versions of kernel too.


I'd like to thank you for the comment regarding use of sch_fq with BBR
above the tcp_needs_internal_pacing function. It has pointed me in the
direction to find the workaround.


Kind regards,

Gasper Zejn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ