lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 17 Feb 2019 16:34:45 -0500 (EST)
From:   Mathieu Desnoyers <>
To:     Rich Felker <>
Cc:     linux-kernel <>,
        "Paul E. McKenney" <>,
        Peter Zijlstra <>,
        Ingo Molnar <>,
        Alexander Viro <>
Subject: Re: Regression in SYS_membarrier expedited

----- On Feb 17, 2019, at 1:48 PM, Rich Felker wrote:

> commit a961e40917fb14614d368d8bc9782ca4d6a8cd11 made it so that the
> MEMBARRIER_CMD_PRIVATE_EXPEDITED command cannot be used without first
> registering intent to use it. However, registration is an expensive
> operation since commit 3ccfebedd8cf54e291c809c838d8ad5cc00f5688, which
> added synchronize_sched() to it; this means it's no longer possible to
> lazily register intent at first use, and it's unreasonably expensive
> to preemptively register intent for possibly extremely-short-lived
> processes that will never use it. (My usage case is in libc (musl),
> where I can't know if the process will be short- or long-lived;
> unnecessary and potentially expensive syscalls can't be made
> preemptively, only lazily at first use.)
> Can we restore the functionality of MEMBARRIER_CMD_PRIVATE_EXPEDITED
> to work even without registration? The motivation of requiring
> registration seems to be:
>    "Registering at this time removes the need to interrupt each and
>    every thread in that process at the first expedited
>    sys_membarrier() system call."
> but interrupting every thread in the process is exactly what I expect,
> and is not a problem. What does seem like a big problem is waiting for
> synchronize_sched() to synchronize with an unboundedly large number of
> cores (vs only a few threads in the process), especially in the
> presence of full_nohz, where it seems like latency would be at least a
> few ms and possibly unbounded.
> Short of a working SYS_membarrier that doesn't require expensive
> pre-registration, I'm stuck just implementing it in userspace with
> signals...

Hi Rich,

Let me try to understand the scenario first.

musl libc support for using membarrier private expedited
would require to first register membarrier private expedited for
the process at musl library init (typically after exec). At that stage, the
process is still single-threaded, right ? So there is no reason
to issue a synchronize_sched() (or now synchronize_rcu() in newer


        if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
                 * Ensure all future scheduler executions will observe the
                 * new thread flag state for this process.

So considering that pre-registration carefully done before the process
becomes multi-threaded just costs a system call (and not a synchronize_sched()),
does it make the pre-registration approach more acceptable ?



Mathieu Desnoyers
EfficiOS Inc.

Powered by blists - more mailing lists