[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230510150415.6BXNs0I1@linutronix.de>
Date: Wed, 10 May 2023 17:04:15 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Ben Segall <bsegall@...gle.com>,
Boqun Feng <boqun.feng@...il.com>,
Crystal Wood <swood@...hat.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
John Stultz <jstultz@...gle.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>
Subject: Re: [PATCH v2 1/4] sched/core: Provide sched_rtmutex() and expose
sched work helpers
On 2023-05-03 15:20:51 [+0200], Peter Zijlstra wrote:
> Urgh, so I really don't like this.
>
> The end result is something like:
>
> rt_mutex_lock()
> sched_submit_work();
> // a nested rt_mutex_lock() here will not clobber
> // ->pi_blocked_on because it's not set yet.
>
> task_blocks_on_rt_mutex();
> tsk->pi_blocked_on = waiter;
> rt_mutex_enqueue(lock, waiter); <-- the real problem
>
> rt_mutex_slowlock_block();
> schedule_rtmutex();
>
> sched_resume_work();
>
> And all of this it not just because tsk->pi_blocked_on, but mostly
> because of task_blocks_on_rt_mutex() enqueueing the waiter. The whole
> enqueue thing is what makes the 'simple' solution of saving/restoring
> tsk->pi_blocked_on not work.
>
> Basically the pi_blocked_on curruption is a side effect, not the
> fundamental issue. One task having two waiters registered is the bigger
> issue.
Yes, one task blocks on two locks but the lock in sched_submit_work()
needs to be solved first.
> Now, sched_submit_work() could also use (regular) mutex -- after all
> it's a fully preemptible context. And then we're subject to the 'same'
> problem but with tsk->blocked_on (DEBUG_MUTEXES=y).
Not sure I follow. We only invoke sched_submit_work() if we block on a
lock which is sleeping on !RT (mutex_t, not spinlock_t). I browsed of
few of the sched_submit_work() callbacks and they all use non-sleeping
locks (on !RT).
If a sched_submit_work() would use a mutex_t lock then we would
recursively call blk_flush_plug() before setting tsk->blocked_on and
perform the same callback and block on the very same lock (again).
This isn't different compared to !RT therefore you must not use a
sleeping lock (mutex_t) in the callback.
> This means that strictly speaking we should litter mutex with the same
> thing :/
No need, see above logic.
> This all feels fragile to me. Too many things spread out in too many
> places. An alternative is something like:
>
> void __sched schedule_pi(void)
> {
> struct task_struct *tsk = current;
> void *waiter = tsk->pi_blocked_on;
>
> sched_submit_work(tsk);
> do {
> preempt_disable();
> if (rt_mutex_blocks(tsk, waiter))
> schedule();
> sched_preempt_enable_no_resched();
> } while (need_resched());
> sched_update_worker(tsk);
> }
>
> And then rt_mutex_blocks() will do the enqueue/boost/optimistic_spin
> thing. However, this is going to be a massive reorg of the rt_mutex code
> and I'm not entirely sure the end result will actually be better... it
> might just make a mess elsewhere :/
It might be not needed…
> > @@ -6723,8 +6720,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
> > blk_flush_plug(tsk->plug, true);
> > }
>
> > +asmlinkage __visible void __sched schedule(void)
> > +{
> > + if (!task_is_running(current))
> > + sched_submit_work();
> > + schedule_loop(SM_NONE);
> > + sched_resume_work();
> > }
> > EXPORT_SYMBOL(schedule);
>
> pulling out task_is_running() like this is going to get into conflict
> with TJs patches here:
>
> https://lkml.kernel.org/r/20230418205159.724789-1-tj@kernel.org
>
> That makes sched_submit_work() do things even if task_is_running().
Do I rebase my stuff on top of his then and we good?
Sebastian
Powered by blists - more mailing lists