[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABWYdi2GG3qi6ucxtyk3=Bu1eXi0N9Dow42F4gzi9DUUc3XhLw@mail.gmail.com>
Date: Tue, 10 Dec 2019 13:32:21 -0800
From: Ivan Babrou <ivan@...udflare.com>
To: linux-kernel <linux-kernel@...r.kernel.org>
Cc: "David S. Miller" <davem@...emloft.net>, hare@...e.com,
axboe@...nel.dk, allison@...utok.net, tglx@...utronix.de,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Lock contention around unix_gc_lock
Hello,
We're seeing very high contention on unix_gc_lock when a bug in an
application makes it stop reading incoming messages with inflight unix
sockets. In our system we churn through a lot of unix sockets and we
have 96 logical CPUs in the system, so spinlock gets very hot.
I was able to halve overall system throughput with 1024 inflight unix
sockets, which is the default RLIMIT_NOFILE. This doesn't sound too
good for isolation, one user should not be able to affect the system
as much. One might even consider this as DoS vector.
There's a lot of time is spent in _raw_spin_unlock_irqrestore, which
is triggered by wait_for_unix_gc, which in turn is unconditionally
called from unix_stream_sendmsg:
ffffffff9f64f3ea _raw_spin_unlock_irqrestore+0xa
ffffffff9eea6ab0 prepare_to_wait_event+0x70
ffffffff9f5a4ac6 wait_for_unix_gc+0x76
ffffffff9f5a182c unix_stream_sendmsg+0x3c
ffffffff9f4bb7f9 sock_sendmsg+0x39
* https://elixir.bootlin.com/linux/v4.19.80/source/net/unix/af_unix.c#L1849
Even more time is spent in waiting on spinlock because of call to
unix_gc from unix_release_sock, where condition is having any inflight
sockets whatsoever:
ffffffff9eeb1758 queued_spin_lock_slowpath+0x158
ffffffff9f5a4718 unix_gc+0x38
ffffffff9f5a28f3 unix_release_sock+0x2b3
ffffffff9f5a2929 unix_release+0x19
ffffffff9f4b902d __sock_release+0x3d
ffffffff9f4b90a1 sock_close+0x11
* https://elixir.bootlin.com/linux/v4.19.80/source/net/unix/af_unix.c#L586
Should this condition take the number of inflight sockets into
account, just like unix_stream_sendmsg does via wait_for_unix_gc?
Static number of inflight sockets that trigger a GC from
wait_for_unix_gc may also be something that is scaled with system
size, rather than be a hardcoded value.
I know that our case is a pathological one, but it sounds like
scalability of garbage collection can be better, especially on systems
with large number of CPUs.
Powered by blists - more mailing lists