[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjBO8yzxdmjTGNiy@linutronix.de>
Date: Tue, 15 Mar 2022 09:31:47 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Oleg Nesterov <oleg@...hat.com>
Cc: linux-kernel@...r.kernel.org, Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT.
On 2022-03-14 19:54:30 [+0100], Oleg Nesterov wrote:
> I never really understood ->saved_state logic. Will read this patch
> tomorrow, but at first glance this patch doesn't solve all problems.
Let me explain the ->saved_state logic:
On !RT, this
set_current_state(TASK_UNINTERRUPTIBLE); // 1
spin_lock(&lock); // 2
spin_unlock(&lock); // 3
schedule(); // 4
will assign ->state, spin on &lock and then invoke schedule() while
->state is still TASK_UNINTERRUPTIBLE.
On RT however, the spinlock_t becomes a sleeping lock and won't spin on
&lock but rather sleep want waiting for the lock. While at sleep waiting
for the lock, the ->state needs to be preserved or otherwise the ->state
gets lost on the wake-up with the lock acquired.
That means RT that happens:
- 1 assigns ->state as with !RT
- 2 acquires &lock. If the is contained then
with current->pi_lock acquired
(current_save_and_set_rtlock_wait_state):
->saved_state = ->state (TASK_UNINTERRUPTIBLE)
->state = TASK_RTLOCK_WAIT
and the task sleeps until &lock is available.
Once the lock is acquired, the task will be woken up and its state is
updated with ->pi_lock acquired (current_restore_rtlock_saved_state):
->state = ->saved_state (TASK_UNINTERRUPTIBLE)
->state = TASK_RUNNING
- 3 unlocks &lock, ->state still TASK_UNINTERRUPTIBLE
- 4 invokes schedule with TASK_UNINTERRUPTIBLE.
The sleeping locks on RT are spinlock_t and rwlock_t.
Side note: If !RT at step 2 spins on the lock then it may receive a wake
up at which point TASK_UNINTERRUPTIBLE becomes TASK_RUNNING and then it
would invoke schedule() with TASK_RUNNING (assuming the condition
becomes sooner available).
On RT, this also works and the task at step 2 may sleep or be in
transition to/ from sleep. Therefore the wake up (under ->pi_lock)
looks at ->state and if it is TASK_RTLOCK_WAIT then it updates
saved_state instead (ttwu_state_match()).
> On 03/02, Sebastian Andrzej Siewior wrote:
> >
> > +static inline bool __task_state_match_eq(struct task_struct *tsk, long state)
> > +{
> > + bool match = false;
> > +
> > + if (READ_ONCE(tsk->__state) == state)
> > + match = true;
> > + else if (tsk->saved_state == state)
> > + match = true;
> > + return match;
> > +}
>
> ...
>
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3239,7 +3239,8 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
> > * is actually now running somewhere else!
> > */
> > while (task_running(rq, p)) {
> > - if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> > + if (match_state &&
> > + unlikely(!task_state_match_eq(p, match_state)))
> > return 0;
>
> So wait_task_inactive() can return 0 but the task can run after that, right?
> This is not what we want...
Without checking both states you may never observe the requested state
because it is set to TASK_RTLOCK_WAIT while waiting for a lock. Other
than that, it may run briefly because it tries to acquire a lock or just
acquired and this shouldn't be different from a task spinning on a lock.
> Oleg.
Sebastian
Powered by blists - more mailing lists