linux-kernel - Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 04 May 2018 23:38:37 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-kernel@...r.kernel.org, tglx@...utronix.de,
        Anna-Maria Gleixner <anna-maria@...utronix.de>
Subject: Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> writes:

> On Fri, May 04, 2018 at 03:08:40PM -0500, Eric W. Biederman wrote:
>> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> writes:
>> 
>> > On Fri, May 04, 2018 at 02:03:04PM -0500, Eric W. Biederman wrote:
>> >> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> writes:
>> >> 
>> >> > On Fri, May 04, 2018 at 12:17:20PM -0500, Eric W. Biederman wrote:
>> >> >> Sebastian Andrzej Siewior <bigeasy@...utronix.de> writes:
>> >> >> 
>> >> >> > On 2018-05-04 11:59:08 [-0500], Eric W. Biederman wrote:
>> >> >> >> Sebastian Andrzej Siewior <bigeasy@...utronix.de> writes:
>> >> >> >> > From: Anna-Maria Gleixner <anna-maria@...utronix.de>
>> >> >> > …
>> >> >> >> > This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make >
>> >> >> >> > wait_lock irq safe") for different reason.
>> >> >> >> 
>> >> >> >> Which tree has this change been made in?  I am not finding the commit
>> >> >> >> you mention above in Linus's tree.
>> >> >> >
>> >> >> > I'm sorry, it should have been commit b4abf91047cf ("rtmutex: Make
>> >> >> > wait_lock irq safe").
>> >> >> 
>> >> >> Can you fix that in your patch description and can you also up the
>> >> >> description of rcu_read_unlock?
>> >> >> 
>> >> >> If we don't need to jump through hoops it looks very reasonable to
>> >> >> remove this unnecessary logic.  But we should fix the description
>> >> >> in rcu_read_unlock that still says we need these hoops.
>> >> >
>> >> > The hoops are still required for rcu_read_lock(), otherwise you
>> >> > get deadlocks between the scheduler and RCU in PREEMPT=y kernels.
>> >> > What happens with this patch (if I understand it correctly) is that the
>> >> > signal code now uses a different way of jumping through the hoops.
>> >> > But the hoops are still jumped through.
>> >> 
>> >> The patch changes:
>> >> 
>> >> local_irq_disable();
>> >> rcu_read_lock();
>> >> spin_lock();
>> >> rcu_read_unlock();
>> >> 
>> >> to:
>> >> 
>> >> rcu_read_lock();
>> >> spin_lock_irq();
>> >> rcu_read_unlock();
>> >> 
>> >> Now that I have a chance to relfect on it the fact that the patern
>> >> that is being restored does not work is scary.  As the failure has
>> >> nothing to do with lock ordering and people won't realize what is going
>> >> on.  Especially since the common rcu modes won't care.
>> >> 
>> >> So is it true that taking spin_lock_irq before calling rcu_read_unlock
>> >> is a problem because of rt_mutex_unlock()?  Or has b4abf91047cf ("rtmutex: Make
>> >> wait_lock irq safe") actually fixed that and we can correct the
>> >> documentation of rcu_read_unlock() ?  And fix __lock_task_sighand?
>> >
>> > The problem is that the thing taking the lock might be the scheduler,
>> > or one of the locks taken while the scheduler's pi and rq locks are
>> > held.  This occurs only with RCU-preempt.
>> >
>> > Here is what can happen:
>> >
>> > o	A task does rcu_read_lock().
>> >
>> > o	That task is preempted.
>> >
>> > o	That task stays preempted for a long time, and is therefore
>> > 	priority boosted.  This boosting involves a high-priority RCU
>> > 	kthread creating an rt_mutex, pretending that the preempted task
>> > 	already holds it, and then acquiring it.
>> >
>> > o	The task awakens, acquires the scheduler's rq lock, and
>> > 	then does rcu_read_unlock().
>> >
>> > o	Because the task has been priority boosted, __rcu_read_unlock()
>> > 	invokes the rcu_read_unlock_special() slowpath, which does
>> > 	(as you say) rt_mutex_unlock() to deboost.  The deboosting
>> > 	can cause the scheduler to acquire the rq and pi locks, which
>> > 	results in deadlock.
>> >
>> > In contrast, holding these scheduler locks across the entirety of the
>> > RCU-preempt read-side critical section is harmless because then the
>> > critical section cannot be preempted, which means that priority boosting
>> > cannot happen, which means that there will be no need to deboost at
>> > rcu_read_unlock() time.
>> >
>> > This restriction has not changed, and as far as I can see is inherent
>> > in the fact that RCU uses the scheduler and the scheduler uses RCU.
>> > There is going to be an odd corner case in there somewhere!
>> 
>> However if I read things correctly b4abf91047cf ("rtmutex: Make
>> wait_lock irq safe") did change this.
>> 
>> In particular it changed things so that it is only the scheduler locks
>> that matter, not any old lock that disabled interrupts.  This was done
>> by disabling disabling interrupts when taking the wait_lock.
>> 
>> The rcu_read_unlock documentation states:
>> 
>>  * In most situations, rcu_read_unlock() is immune from deadlock.
>>  * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock()
>>  * is responsible for deboosting, which it does via rt_mutex_unlock().
>>  * Unfortunately, this function acquires the scheduler's runqueue and
>>  * priority-inheritance spinlocks.  This means that deadlock could result
>>  * if the caller of rcu_read_unlock() already holds one of these locks or
>>  * any lock that is ever acquired while holding them; or any lock which
>>  * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
>>  * does not disable irqs while taking ->wait_lock.
>> 
>> So we can now remove the clause:
>>  * ; or any lock which
>>  * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
>>  * does not disable irqs while taking ->wait_lock.
>> 
>> Without the any lock that disabled interrupts restriction it is now safe
>> to not worry about the issues with the scheduler locks and the rt_mutex
>> Which does make it safe to not worry about these crazy complexities in
>> lock_task_sighand.
>> 
>> Paul do you agree or is the patch unsafe?
>
> Ah, I thought you were trying to get rid of all but the first line of
> that paragraph, not just the final clause.  Apologies for my confusion!
>
> It looks plausible, but the patch should be stress-tested on a preemptible
> kernel with priority boosting enabled.  Has that been done?
>
> (Me, I would run rcutorture scenario TREE03 for an extended time period
> on b4abf91047cf with your patch applied.  But what testing have you
> done already?)

Not my patch.  I was just the reviewer asking for some obvious cleanups.
So people won't get confused.  Sebastian?  Can you tell Paul what
testing you have done?

Eric