[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez2oNKbSvNavKLEe2iYm1Cj+OaaaF45FA8cqkY+-7DuJTw@mail.gmail.com>
Date: Mon, 13 Dec 2021 20:27:41 +0100
From: Jann Horn <jannh@...gle.com>
To: Florian Weimer <fweimer@...hat.com>
Cc: linux-api@...r.kernel.org,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
libc-alpha@...rceware.org, linux-kernel@...r.kernel.org
Subject: Re: rseq + membarrier programming model
On Mon, Dec 13, 2021 at 7:48 PM Florian Weimer <fweimer@...hat.com> wrote:
> I've been studying Jann Horn's biased locking example:
>
> Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
> <https://lore.kernel.org/linux-api/CAG48ez02UDn_yeLuLF4c=kX0=h2Qq8Fdb0cer1yN8atbXSNjkQ@mail.gmail.com/>
>
> It uses MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ as part of the biased lock
> revocation.
>
> How does the this code know that the process has called
> MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ? Could it fall back to
> MEMBARRIER_CMD_GLOBAL instead?
AFAIK no - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ specifically
forces targeted processes to go through an RSEQ preemption. That only
happens when this special membarrier command is used and when an
actual task switch happens; other membarrier flavors don't guarantee
that.
Also, MEMBARRIER_CMD_GLOBAL can take really long in terms of wall
clock time - it's basically just synchronize_rcu(), and as the
documentation at
https://www.kernel.org/doc/html/latest/RCU/Design/Requirements/Requirements.html
says:
"The synchronize_rcu() grace-period-wait primitive is optimized for
throughput. It may therefore incur several milliseconds of latency in
addition to the duration of the longest RCU read-side critical
section."
You can see that synchronize_rcu() indeed takes quite long in terms of
wall clock time (but not in terms of CPU time - as the documentation
says, it's optimized for throughput in a parallel context) with a
simple test program:
jannh@...top:~/test/rcu$ cat rcu_membarrier.c
#define _GNU_SOURCE
#include <stdio.h>
#include <linux/membarrier.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <time.h>
#include <err.h>
int main(void) {
for (int i=0; i<20; i++) {
struct timespec ts1;
if (clock_gettime(CLOCK_MONOTONIC, &ts1))
err(1, "time");
if (syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL, 0, 0))
err(1, "membarrier");
struct timespec ts2;
if (clock_gettime(CLOCK_MONOTONIC, &ts2))
err(1, "time");
unsigned long delta_ns = (ts2.tv_nsec - ts1.tv_nsec) +
(1000UL*1000*1000) * (ts2.tv_sec - ts1.tv_sec);
printf("MEMBARRIER_CMD_GLOBAL took %lu nanoseconds\n", delta_ns);
}
}
jannh@...top:~/test/rcu$ gcc -o rcu_membarrier rcu_membarrier.c -Wall
jannh@...top:~/test/rcu$ time ./rcu_membarrier
MEMBARRIER_CMD_GLOBAL took 17155142 nanoseconds
MEMBARRIER_CMD_GLOBAL took 19207001 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16087350 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15963711 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16336149 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15931331 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16020315 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15873814 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15945667 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23815452 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23626444 nanoseconds
MEMBARRIER_CMD_GLOBAL took 19911435 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23967343 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15943147 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23914809 nanoseconds
MEMBARRIER_CMD_GLOBAL took 32498986 nanoseconds
MEMBARRIER_CMD_GLOBAL took 19450932 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16281308 nanoseconds
MEMBARRIER_CMD_GLOBAL took 24045168 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15406698 nanoseconds
real 0m0.458s
user 0m0.058s
sys 0m0.031s
jannh@...top:~/test/rcu$
Every invocation of MEMBARRIER_CMD_GLOBAL on my laptop took >10 ms.
Powered by blists - more mailing lists