[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k1apqqgk.fsf@x220.int.ebiederm.org>
Date: Tue, 03 Sep 2019 11:44:59 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>,
Russell King - ARM Linux admin <linux@...linux.org.uk>,
Chris Metcalf <cmetcalf@...hip.com>,
Christoph Lameter <cl@...ux.com>,
Kirill Tkhai <tkhai@...dex.ru>, Mike Galbraith <efault@....de>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [PATCH 2/3] task: RCU protect tasks on the runqueue
Peter Zijlstra <peterz@...radead.org> writes:
> On Tue, Sep 03, 2019 at 09:41:17AM +0200, Peter Zijlstra wrote:
>> On Mon, Sep 02, 2019 at 11:52:01PM -0500, Eric W. Biederman wrote:
>>
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index 2b037f195473..802958407369 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>>
>> > @@ -3857,7 +3857,7 @@ static void __sched notrace __schedule(bool preempt)
>> >
>> > if (likely(prev != next)) {
>> > rq->nr_switches++;
>> > - rq->curr = next;
>> > + rcu_assign_pointer(rq->curr, next);
>> > /*
>> > * The membarrier system call requires each architecture
>> > * to have a full memory barrier after updating
>>
>> This one is sad; it puts a (potentially) expensive barrier in here. And
>> I'm not sure I can explain the need for it. That is, we've not changed
>> @next before this and don't need to 'publish' it as such.
>>
>> Can we use RCU_INIT_POINTER() or simply WRITE_ONCE(), here?
>
> That is, I'm thinking we qualify for point 3 (both a and b) of
> RCU_INIT_POINTER().
I don't think point (b) is a concern on any widely visible architecture.
After taking a quick skim through the users it does appear to me that
we almost definitely have changes to the task_struct since the last time
another cpu say that structure (3 a) and that we have cases where
reading stale values in the task_struct will result in incorrect
operation of the code.
The concern of point (b) is the old alpha caching case where you could
dereference a pointer and get a stale copy of the data structure. This
is a concern when an you are following the pointer from another cpu.
>From my quick skim the cases I can see where point (b) might apply are
in fair.c:task_numa_compare lots of fields in task_struct are read. It
looks like reading a stale (old/wrong) value of cur->numa_group could be
very inexplicable and weird. Similarly in the membarrier code reading a
pre-exec version of cur->mm could give completely inexplicable results.
Finally in rcuwait_wake_up reading a stale version of the process
cur->state could cause incorrect or missed wake ups in wake_up_process.
There might already be enough barriers in the scheduler that the barrier
in rcu_update_pointer is redundant. The comment about membarrier at
least suggests that for processes that return to userspace we have a
full memory barrier.
So with a big fat comment explaining why it is safe we could potentially
use RCU_INIT_POINTER. I currently don't see where the appropriate
barriers are so I can not write that comment or with a clear conscious
write the code to use RCU_INIT_POINTER instead of rcu_assign_pointer.
Eric
Powered by blists - more mailing lists