linux-kernel - Re: [PATCH] kcsan: Add option to allow watcher interruptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200725201013.GZ119549@hirez.programming.kicks-ass.net>
Date:   Sat, 25 Jul 2020 22:10:13 +0200
From:   peterz@...radead.org
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Marco Elver <elver@...gle.com>,
        Andrey Konovalov <andreyknvl@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        kasan-dev <kasan-dev@...glegroups.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kcsan: Add option to allow watcher interruptions

On Sat, Jul 25, 2020 at 12:39:09PM -0700, Paul E. McKenney wrote:
> On Sat, Jul 25, 2020 at 07:44:30PM +0200, Peter Zijlstra wrote:

> > So the thing is, since RCU count is 0 per context (an IRQ must have an
> > equal amount of rcu_read_unlock() as it has rcu_read_lock()), interrupts
> > are not in fact a problem, even on load-store (RISC) architectures
> > (preempt_count has the same thing).
> 
> True enough!
> 
> > So the addition/subtraction in rcu_preempt_read_{enter,exit}() doesn't
> > need to be atomic vs interrupts. The only thing we really do need is
> > them being single-copy-atomic.
> > 
> > The problem with READ/WRITE_ONCE is that if we were to use it, we'd end
> > up with a load-store, even on x86, which is sub-optimal.
> 
> Agreed.
> 
> > I suppose the 'correct' code here would be something like:
> > 
> > 	*((volatile int *)&current->rcu_read_lock_nesting)++;
> > 
> > then the compiler can either do a single memop (x86 and the like) or a
> > load-store that is free from tearing.
> 
> Hah!!!  That is the original ACCESS_ONCE(), isn't it?  ;-)
> 
> 	ACCESS_ONCE(current->rcu_read_lock_nesting)++;

Indeed :-)

> But open-coding makes sense unless a lot of other places need something
> similar.  Besides, open-coding allows me to defer bikeshedding on the
> name, given that there are actually two accesses.  :-/

Yeah, ISTR that being one of the reasons we got rid of it.

> So:
> 	(*(volatile int *)&(current->rcu_read_lock_nesting))++;

Urgh, sorry for messing that up.

> This gets me the following for __rcu_read_lock():
> 
> 00000000000000e0 <__rcu_read_lock>:
>       e0:	48 8b 14 25 00 00 00 	mov    0x0,%rdx
>       e7:	00 
>       e8:	8b 82 e0 02 00 00    	mov    0x2e0(%rdx),%eax
>       ee:	83 c0 01             	add    $0x1,%eax
>       f1:	89 82 e0 02 00 00    	mov    %eax,0x2e0(%rdx)
>       f7:	c3                   	retq   
>       f8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
>       ff:	00 
> 
> One might hope for a dec instruction, but this isn't bad.  We do lose
> a few instructions compared to the C-language case due to differences
> in address calculation:
> 
> 00000000000000e0 <__rcu_read_lock>:
>       e0:	48 8b 04 25 00 00 00 	mov    0x0,%rax
>       e7:	00 
>       e8:	83 80 e0 02 00 00 01 	addl   $0x1,0x2e0(%rax)
>       ef:	c3                   	retq   

Shees, that's daft... I think this is one of the cases where GCC is
perhaps overly cautious when presented with 'volatile'.

It has a history of generating excessively crap code around volatile,
and while it has improved somewhat, this seems to show there's still
room for improvement...

I suppose this is the point where we go bug a friendly compiler person.

Alternatively we can employ data_race() and trust the compiler not to be
daft about tearing... which we've been relying with this code anyway.