[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d3242ec-1e56-4a51-a950-20a185a5cc3b@paulmck-laptop>
Date: Thu, 16 Jan 2025 13:55:16 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: rcu@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Kernel Team <kernel-team@...a.com>,
Steven Rostedt <rostedt@...dmis.org>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Kent Overstreet <kent.overstreet@...ux.dev>,
bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH rcu 13/17] srcu: Add SRCU-fast readers
On Thu, Jan 16, 2025 at 01:00:24PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 16, 2025 at 12:21 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > +/*
> > + * Counts the new reader in the appropriate per-CPU element of the
> > + * srcu_struct. Returns a pointer that must be passed to the matching
> > + * srcu_read_unlock_fast().
> > + *
> > + * Note that this_cpu_inc() is an RCU read-side critical section either
> > + * because it disables interrupts, because it is a single instruction,
> > + * or because it is a read-modify-write atomic operation, depending on
> > + * the whims of the architecture.
> > + */
> > +static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp)
> > +{
> > + struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
> > +
> > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast().");
> > + this_cpu_inc(scp->srcu_locks.counter); /* Y */
> > + barrier(); /* Avoid leaking the critical section. */
> > + return scp;
> > +}
>
> This doesn't look fast.
> If I'm reading this correctly,
> even with debugs off RCU_LOCKDEP_WARN() will still call
> rcu_is_watching() and this doesn't look cheap or fast.
Here is the CONFIG_PROVE_RCU=n definition:
#define RCU_LOCKDEP_WARN(c, s) do { } while (0 && (c))
The "0" in the "0 && (c)" should prevent that call to rcu_is_watching().
But why not see what the compiler thinks? I added the following function
to kernel/rcu/srcutree.c:
struct srcu_ctr __percpu *test_srcu_read_lock_fast(struct srcu_struct *ssp)
{
struct srcu_ctr __percpu *p;
p = srcu_read_lock_fast(ssp);
return p;
}
This function compiles to the following code:
Dump of assembler code for function test_srcu_read_lock_fast:
0xffffffff811220c0 <+0>: endbr64
0xffffffff811220c4 <+4>: sub $0x8,%rsp
0xffffffff811220c8 <+8>: mov 0x8(%rdi),%rax
0xffffffff811220cc <+12>: add %gs:0x7eef3944(%rip),%rax # 0x15a18 <this_cpu_off>
0xffffffff811220d4 <+20>: mov 0x20(%rax),%eax
0xffffffff811220d7 <+23>: test $0x8,%al
0xffffffff811220d9 <+25>: je 0xffffffff811220eb <test_srcu_read_lock_fast+43>
0xffffffff811220db <+27>: mov (%rdi),%rax
0xffffffff811220de <+30>: incq %gs:(%rax)
0xffffffff811220e2 <+34>: add $0x8,%rsp
0xffffffff811220e6 <+38>: jmp 0xffffffff81f5fe60 <__x86_return_thunk>
0xffffffff811220eb <+43>: mov $0x8,%esi
0xffffffff811220f0 <+48>: mov %rdi,(%rsp)
0xffffffff811220f4 <+52>: call 0xffffffff8111fb90 <__srcu_check_read_flavor>
0xffffffff811220f9 <+57>: mov (%rsp),%rdi
0xffffffff811220fd <+61>: jmp 0xffffffff811220db <test_srcu_read_lock_fast+27>
The first call to srcu_read_lock_fast() invokes __srcu_check_read_flavor(),
but after that the "je" instruction will fall through. So the common-case
code path executes only the part of this function up to and including the
"jmp 0xffffffff81f5fe60 <__x86_return_thunk>".
Does that serve?
Thanx, Paul
Powered by blists - more mailing lists