linux-kernel - Re: [PATCH] sched/core: Schedule new worker even if PI-blocked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190820155401.c5apbxjntdz5n2gk@linutronix.de>
Date:   Tue, 20 Aug 2019 17:54:01 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        tglx@...utronix.de
Subject: Re: [PATCH] sched/core: Schedule new worker even if PI-blocked

On 2019-08-20 17:20:25 [+0200], Peter Zijlstra wrote:
> > There isc RCU (boosting) and futex. I'm sceptical about the i2c users…
> 
> Well, yes, I too was/am sceptical, but it was tglx who twisted my arm
> and said the i2c people were right and rt_mutex is/should-be a generic
> usable interface.

I don't mind the generic interface I just find the use-case odd. So by
now rtmutex is used by i2c core and not a single driver like it the case
the last time I looked at it. But still, why is it (PI-boosting)
important for I2C to use it and not for other subsystems? Moving on…

> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -3945,7 +3945,7 @@ void __noreturn do_task_dead(void)
> > > >  
> > > >  static inline void sched_submit_work(struct task_struct *tsk)
> > > >  {
> > > > -	if (!tsk->state || tsk_is_pi_blocked(tsk))
> > > > +	if (!tsk->state)
> > > >  		return;
> > > >  
> > > >  	/*
> 
> So this part actually makes rt_mutex less special and is good.
> 
> > > > @@ -3961,6 +3961,9 @@ static inline void sched_submit_work(str
> > > >  		preempt_enable_no_resched();
> > > >  	}
> > > >  
> > > > +	if (tsk_is_pi_blocked(tsk))
> > > > +		return;
> > > > +
> > > >  	/*
> > > >  	 * If we are going to sleep and we have plugged IO queued,
> > > >  	 * make sure to submit it to avoid deadlocks.
> > > 
> > > What do we need that clause for? Why is pi_blocked special _at_all_?
> > 
> > so !RT the scheduler does nothing special if a task blocks on sleeping
> > lock. 
> > If I remember correctly then blk_schedule_flush_plug() is the problem.
> > It may require a lock which is held by the task. 
> > It may hold A and wait for B while another task has B and waits for A. 
> > If my memory does bot betray me then ext+jbd can lockup without this.
> 
> And am I right in thinking that that, again, is specific to the
> sleeping-spinlocks from PREEMPT_RT? Is there really nothing else that
> identifies those more specifically? It's been a while since I looked at
> them.

Not really. I hacked "int sleeping_lock" into task_struct which is
incremented each time a "sleeping lock" version of rtmutex is requested.
We have two users as of now:
- RCU, which checks if we schedule() while holding rcu_read_lock() which
  is okay if it is a sleeping lock.

- NOHZ's pending softirq detection while going to idle. It is possible
  that "ksoftirqd" and "current" are blocked on locks and the CPU goes
  to idle (because nothing else is runnable) with pending softirqs.

I wanted to let rtmutex invoke another schedule() function in case of a
sleeping lock to avoid the RCU warning. This would avoid incrementing
"sleeping_lock" in the fast path. But then I had no idea what to do with
the NOHZ thing.

> Also, I suppose it would be really good to put that in a comment.
So, what does that mean for that patch. According to my inbox it has
applied to an "urgent" branch. Do I resubmit the whole thing or just a
comment on top?

Sebastian