linux-kernel - Re: [PATCH] locking/qrwlock: Give priority to readers with irqs disabled to prevent deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0092bc52-6536-4226-cdbf-8bdc97265c3f@virtuozzo.com>
Date:   Thu, 5 Apr 2018 12:43:35 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, stable@...r.kernel.org,
        linux-kernel@...r.kernel.org, Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH] locking/qrwlock: Give priority to readers with irqs
 disabled to prevent deadlock

On 04.04.2018 19:18, Peter Zijlstra wrote:
> On Wed, Apr 04, 2018 at 06:51:08PM +0300, Kirill Tkhai wrote:
>> On 04.04.2018 18:35, Peter Zijlstra wrote:
>>> On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
>>>> The following situation leads to deadlock:
>>>>
>>>> [task 1]                          [task 2]                         [task 3]
>>>> kill_fasync()                     mm_update_next_owner()           copy_process()
>>>>  spin_lock_irqsave(&fa->fa_lock)   read_lock(&tasklist_lock)        write_lock_irq(&tasklist_lock)
>>>>   send_sigio()                    <IRQ>                             ...
>>>>    read_lock(&fown->lock)         kill_fasync()                     ...
>>>>     read_lock(&tasklist_lock)      spin_lock_irqsave(&fa->fa_lock)  ...
>>>>
>>>> Task 1 can't acquire read locked tasklist_lock, since there is
>>>> already task 3 expressed its wish to take the lock exclusive.
>>>> Task 2 holds the read locked lock, but it can't take the spin lock.
>>>>
>>>> The patch makes queued_read_lock_slowpath() to give task 1 the same
>>>> priority as it was an interrupt handler, and to take the lock
>>>
>>> That re-introduces starvation scenarios. And the above looks like a
>>> proper deadlock that should be sorted by fixing the locking order.
>>
>> We can move tasklist_lock out of send_sigio(), but I'm not sure
>> it's possible for read_lock(&fown->lock).
> 
> So the scenario is:
> 
> CPU0			CPU1				CPU2
> 
> spin_lock_irqsave(&fa->fa_lock);
> 			read_lock(&tasklist_lock)
> 							write_lock_irq(&tasklist_lock)
> read_lock(&tasklist_lock)
> 			<IRQ>
> 			  spin_lock_irqsave(&fa->fa_lock);
> 
> 
> Right? (where the row now signifies time)
> 
> That doesn't seem to include fown->lock, you're saying it has an
> identical issue?

There is also read_lock(), which can be taken from interrupt:

CPU0                                          CPU1                                     CPU2

f_getown()                                    kill_fasync()                               
  read_lock(&f_own->lock)                      spin_lock_irqsave(&fa->fa_lock, flags)    
  <IRQ>                                        send_sigio()                            write_lock_irq(&f_own->lock);
    kill_fasync()                                 read_lock(&fown->lock)
      spin_lock_irqsave(&fa->fa_lock, flags)

To prevent deadlock, this requires all &f_own->lock be taken via read_lock_irqsave().

This may be formalized as lockdep rule: if spinlock nests into read_lock(), and they
both can be called from interrupt, the rest of read_lock() always must disable interrupts.

>> Is there another solution? Is there reliable way to iterate do_each_pid_task()
>> with rcu_read_lock()?
> 
> Depends on what you call reliable :-), Yes you can use
> do_each_pid_task() with RCU, but as always you're prone to see tasks
> that are dead and miss tasks that just came in.
>
> If that is sufficient for the signal muck, dunno :/ Typically signals
> use sighand lock, not tasklist_lock.

The first thing is not a problem, while missing a newly moved task is not suitable.
So, it seems it's not reliable...

Kirill