linux-kernel - Re: Rseq registration: Google tcmalloc vs glibc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <416869220.9190.1582753869096.JavaMail.zimbra@efficios.com>
Date:   Wed, 26 Feb 2020 16:51:09 -0500 (EST)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     "Joel Fernandes, Google" <joel@...lfernandes.org>,
        Chris Kennelly <ckennelly@...gle.com>
Cc:     Paul Turner <pjt@...gle.com>, Florian Weimer <fweimer@...hat.com>,
        Carlos O'Donell <codonell@...hat.com>,
        libc-alpha <libc-alpha@...rceware.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        paulmck <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
        Brian Geffon <bgeffon@...gle.com>
Subject: Re: Rseq registration: Google tcmalloc vs glibc

----- On Feb 25, 2020, at 10:24 PM, Joel Fernandes, Google joel@...lfernandes.org wrote:
[..]
> 
> Chris, Brian, is there any other concern to upgrading of tcmalloc
> version in ChromeOS? I believe there was some concern about detection
> of rseq kernel support. A quick look at tcmalloc shows it does not do
> such detection, but I can stand corrected. One more thing, currently
> tcmalloc does not use rseq on ARM. If I recall, ARM does have rseq
> support as well. So we ought to enable it for that arch as well if
> possible. Why not enable it on all arches and then dynamically detect
> at runtime if needed support is available?

Please allow me to raise a concern with respect to the implementation
of the SlowFence() function in tcmalloc/internal/percpu.cc. It uses
sched_setaffinity to move the thread around to each CPU part of the
cpu mask.

There are a couple of corner-cases where I think it can malfunction:

- Interaction with concurrent sched_setaffinity invoked by an external
  manager process: If an external manager process attempts to limit this
  thread's ability to run onto specific CPUs, either before the thread
  starts or concurrently while the thread executes, I suspect the
  SlowFence() algorithm will simply handle errors while trying to set
  affinity by skipping CPUs, which results in a skipped rseq fence,
  which in turn can cause corruption.

  The comments in this function state:

    // If we can't pin ourselves there, then no one else can run there, so
    // that's fine.

  But AFAIU the thread's cpu affinity is a per-thread attribute, so saying
  that no other thread from the same process can run there seems wrong. What
  am I missing ? Maybe it is a difference between cpusets and sched_setaffinity ?

  The code below opens /proc/self/cpuset to deal with concurrent affinity
  updates by cpuset seems to rely on CONFIG_CPUSETS=y, and does not seem to
  take into account CPU affinity changes through sched_setaffinity.

  Moreover, reading through the comments there, depending on internal kernel
  synchronization implementation details for dealing with concurrent cpuset
  updates seems very fragile. Those details about internal locking of
  cpuset.cpus within the kernel should not be expected to be ABI.

- Interaction with CPU hotplug. If a target CPU is unplugged and plugged
  again (offline, then online) concurrently, this algorithm may skip that
  CPU and thus skip a rseq fence, which can also cause corruption.

Those limitations of sched_setaffinity() are the reasons why I have
proposed a new "pin_on_cpu()" system call [1]. Feedback in that area
is very welcome.

Thanks,

Mathieu

[1] https://lore.kernel.org/r/20200121160312.26545-1-mathieu.desnoyers@efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com