[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <416869220.9190.1582753869096.JavaMail.zimbra@efficios.com>
Date: Wed, 26 Feb 2020 16:51:09 -0500 (EST)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: "Joel Fernandes, Google" <joel@...lfernandes.org>,
Chris Kennelly <ckennelly@...gle.com>
Cc: Paul Turner <pjt@...gle.com>, Florian Weimer <fweimer@...hat.com>,
Carlos O'Donell <codonell@...hat.com>,
libc-alpha <libc-alpha@...rceware.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
paulmck <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Brian Geffon <bgeffon@...gle.com>
Subject: Re: Rseq registration: Google tcmalloc vs glibc
----- On Feb 25, 2020, at 10:24 PM, Joel Fernandes, Google joel@...lfernandes.org wrote:
[..]
>
> Chris, Brian, is there any other concern to upgrading of tcmalloc
> version in ChromeOS? I believe there was some concern about detection
> of rseq kernel support. A quick look at tcmalloc shows it does not do
> such detection, but I can stand corrected. One more thing, currently
> tcmalloc does not use rseq on ARM. If I recall, ARM does have rseq
> support as well. So we ought to enable it for that arch as well if
> possible. Why not enable it on all arches and then dynamically detect
> at runtime if needed support is available?
Please allow me to raise a concern with respect to the implementation
of the SlowFence() function in tcmalloc/internal/percpu.cc. It uses
sched_setaffinity to move the thread around to each CPU part of the
cpu mask.
There are a couple of corner-cases where I think it can malfunction:
- Interaction with concurrent sched_setaffinity invoked by an external
manager process: If an external manager process attempts to limit this
thread's ability to run onto specific CPUs, either before the thread
starts or concurrently while the thread executes, I suspect the
SlowFence() algorithm will simply handle errors while trying to set
affinity by skipping CPUs, which results in a skipped rseq fence,
which in turn can cause corruption.
The comments in this function state:
// If we can't pin ourselves there, then no one else can run there, so
// that's fine.
But AFAIU the thread's cpu affinity is a per-thread attribute, so saying
that no other thread from the same process can run there seems wrong. What
am I missing ? Maybe it is a difference between cpusets and sched_setaffinity ?
The code below opens /proc/self/cpuset to deal with concurrent affinity
updates by cpuset seems to rely on CONFIG_CPUSETS=y, and does not seem to
take into account CPU affinity changes through sched_setaffinity.
Moreover, reading through the comments there, depending on internal kernel
synchronization implementation details for dealing with concurrent cpuset
updates seems very fragile. Those details about internal locking of
cpuset.cpus within the kernel should not be expected to be ABI.
- Interaction with CPU hotplug. If a target CPU is unplugged and plugged
again (offline, then online) concurrently, this algorithm may skip that
CPU and thus skip a rseq fence, which can also cause corruption.
Those limitations of sched_setaffinity() are the reasons why I have
proposed a new "pin_on_cpu()" system call [1]. Feedback in that area
is very welcome.
Thanks,
Mathieu
[1] https://lore.kernel.org/r/20200121160312.26545-1-mathieu.desnoyers@efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists