lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 10 May 2023 17:04:15 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ben Segall <bsegall@...gle.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Crystal Wood <swood@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...hat.com>,
        John Stultz <jstultz@...gle.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>
Subject: Re: [PATCH v2 1/4] sched/core: Provide sched_rtmutex() and expose
 sched work helpers

On 2023-05-03 15:20:51 [+0200], Peter Zijlstra wrote:
> Urgh, so I really don't like this.
> 
> The end result is something like:
> 
> 	rt_mutex_lock()
> 	  sched_submit_work();
> 	    // a nested rt_mutex_lock() here will not clobber
> 	    // ->pi_blocked_on because it's not set yet.
> 
> 	  task_blocks_on_rt_mutex();
> 	    tsk->pi_blocked_on = waiter;
> 	    rt_mutex_enqueue(lock, waiter); <-- the real problem
> 
> 	  rt_mutex_slowlock_block();
> 	    schedule_rtmutex();
> 
> 	  sched_resume_work();
> 
> And all of this it not just because tsk->pi_blocked_on, but mostly
> because of task_blocks_on_rt_mutex() enqueueing the waiter. The whole
> enqueue thing is what makes the 'simple' solution of saving/restoring
> tsk->pi_blocked_on not work.
> 
> Basically the pi_blocked_on curruption is a side effect, not the
> fundamental issue. One task having two waiters registered is the bigger
> issue.

Yes, one task blocks on two locks but the lock in sched_submit_work()
needs to be solved first.

> Now, sched_submit_work() could also use (regular) mutex -- after all
> it's a fully preemptible context. And then we're subject to the 'same'
> problem but with tsk->blocked_on (DEBUG_MUTEXES=y).

Not sure I follow. We only invoke sched_submit_work() if we block on a
lock which is sleeping on !RT (mutex_t, not spinlock_t). I browsed of
few of the sched_submit_work() callbacks and they all use non-sleeping
locks (on !RT).
If a sched_submit_work() would use a mutex_t lock then we would
recursively call blk_flush_plug() before setting tsk->blocked_on and
perform the same callback and block on the very same lock (again).
This isn't different compared to !RT therefore you must not use a
sleeping lock (mutex_t) in the callback.

> This means that strictly speaking we should litter mutex with the same
> thing :/

No need, see above logic.

> This all feels fragile to me. Too many things spread out in too many
> places. An alternative is something like:
> 
> void __sched schedule_pi(void)
> {
> 	struct task_struct *tsk = current;
> 	void *waiter = tsk->pi_blocked_on;
> 
> 	sched_submit_work(tsk);
> 	do {
> 		preempt_disable();
> 		if (rt_mutex_blocks(tsk, waiter))
> 			schedule();
> 		sched_preempt_enable_no_resched();
> 	} while (need_resched());
> 	sched_update_worker(tsk);
> }
> 
> And then rt_mutex_blocks() will do the enqueue/boost/optimistic_spin
> thing. However, this is going to be a massive reorg of the rt_mutex code
> and I'm not entirely sure the end result will actually be better... it
> might just make a mess elsewhere :/

It might be not needed…

> > @@ -6723,8 +6720,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
> >  	blk_flush_plug(tsk->plug, true);
> >  }
> 
> > +asmlinkage __visible void __sched schedule(void)
> > +{
> > +	if (!task_is_running(current))
> > +		sched_submit_work();
> > +	schedule_loop(SM_NONE);
> > +	sched_resume_work();
> >  }
> >  EXPORT_SYMBOL(schedule);
> 
> pulling out task_is_running() like this is going to get into conflict
> with TJs patches here:
> 
>   https://lkml.kernel.org/r/20230418205159.724789-1-tj@kernel.org
> 
> That makes sched_submit_work() do things even if task_is_running().

Do I rebase my stuff on top of his then and we good?

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ