linux-kernel - Re: [PATCH 2/3] task: RCU protect tasks on the runqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87k1apqqgk.fsf@x220.int.ebiederm.org>
Date:   Tue, 03 Sep 2019 11:44:59 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Russell King - ARM Linux admin <linux@...linux.org.uk>,
        Chris Metcalf <cmetcalf@...hip.com>,
        Christoph Lameter <cl@...ux.com>,
        Kirill Tkhai <tkhai@...dex.ru>, Mike Galbraith <efault@....de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [PATCH 2/3] task: RCU protect tasks on the runqueue

Peter Zijlstra <peterz@...radead.org> writes:

> On Tue, Sep 03, 2019 at 09:41:17AM +0200, Peter Zijlstra wrote:
>> On Mon, Sep 02, 2019 at 11:52:01PM -0500, Eric W. Biederman wrote:
>> 
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index 2b037f195473..802958407369 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> 
>> > @@ -3857,7 +3857,7 @@ static void __sched notrace __schedule(bool preempt)
>> >  
>> >  	if (likely(prev != next)) {
>> >  		rq->nr_switches++;
>> > -		rq->curr = next;
>> > +		rcu_assign_pointer(rq->curr, next);
>> >  		/*
>> >  		 * The membarrier system call requires each architecture
>> >  		 * to have a full memory barrier after updating
>> 
>> This one is sad; it puts a (potentially) expensive barrier in here. And
>> I'm not sure I can explain the need for it. That is, we've not changed
>> @next before this and don't need to 'publish' it as such.
>> 
>> Can we use RCU_INIT_POINTER() or simply WRITE_ONCE(), here?
>
> That is, I'm thinking we qualify for point 3 (both a and b) of
> RCU_INIT_POINTER().

I don't think point (b) is a concern on any widely visible architecture.
After taking a quick skim through the users it does appear to me that
we almost definitely have changes to the task_struct since the last time
another cpu say that structure (3 a) and that we have cases where
reading stale values in the task_struct will result in incorrect
operation of the code.

The concern of point (b) is the old alpha caching case where you could
dereference a pointer and get a stale copy of the data structure.  This
is a concern when an you are following the pointer from another cpu.

>From my quick skim the cases I can see where point (b) might apply are
in fair.c:task_numa_compare lots of fields in task_struct are read.  It
looks like reading a stale (old/wrong) value of cur->numa_group could be
very inexplicable and weird.  Similarly in the membarrier code reading a
pre-exec version of cur->mm could give completely inexplicable results.
Finally in rcuwait_wake_up reading a stale version of the process
cur->state could cause incorrect or missed wake ups in wake_up_process.

There might already be enough barriers in the scheduler that the barrier
in rcu_update_pointer is redundant.  The comment about membarrier at
least suggests that for processes that return to userspace we have a
full memory barrier.

So with a big fat comment explaining why it is safe we could potentially
use RCU_INIT_POINTER.  I currently don't see where the appropriate
barriers are so I can not write that comment or with a clear conscious
write the code to use RCU_INIT_POINTER instead of rcu_assign_pointer.

Eric