lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABWYdi1kiu1g1mAq6DpQWczg78tMzaVFnytNMemZATFHqYSqYw@mail.gmail.com>
Date: Thu, 19 Oct 2023 15:35:01 -0700
From: Ivan Babrou <ivan@...udflare.com>
To: Linux Kernel Network Developers <netdev@...r.kernel.org>
Cc: kernel-team <kernel-team@...udflare.com>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: wait_for_unix_gc can cause CPU overload for well behaved programs

Hello,

We have observed this issue twice (2019 and 2023): a well behaved
service that doesn't pass any file descriptors around starts to spend
a ton of CPU time in wait_for_unix_gc.

The cause of this is that the unix send path unconditionally calls
wait_for_unix_gc, which is a global garbage collection. If any
misbehaved program exists on a system, it can force extra work for
well behaved programs.

This behavior is not new: 9915672d4127 ("af_unix: limit
unix_tot_inflight") is from 2010.

I managed to come up with a repro for this behavior:

* https://gist.github.com/bobrik/82e5722261920c9f23d9402b88a0bb27

It also includes a flamegraph illustrating the issue. It's all in one
program for convenience, but in reality the offender not picking up
SCM_RIGHTS messages and the suffering program just minding its own
business are separate.

It is also non-trivial to find the offender when this happens as it
can be completely idle while wrecking havoc for the rest of the
system.

I don't think it's fair to penalize every unix_stream_sendmsg like
this. The 16k threshold also doesn't feel very flexible, surely
computers are bigger these days and can handle more.

Powered by blists - more mailing lists