linux-kernel - Re: [PATCH] fasync: Fix deadlock between task-context and interrupt-context kill

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d0c9c1e8-0d31-a223-ccf1-9b4a736624c7@virtuozzo.com>
Date:   Thu, 19 Apr 2018 01:40:45 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Jeff Layton <jlayton@...nel.org>,
        Matthew Wilcox <willy@...radead.org>
Cc:     bfields@...ldses.org, viro@...iv.linux.org.uk,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        boqun.feng@...il.com, longman@...hat.com, peterz@...radead.org,
        mingo@...hat.com
Subject: Re: [PATCH] fasync: Fix deadlock between task-context and
 interrupt-context kill_fasync()

On 18.04.2018 23:00, Jeff Layton wrote:
> On Tue, 2018-04-17 at 17:15 +0300, Kirill Tkhai wrote:
>> On 17.04.2018 17:01, Matthew Wilcox wrote:
>>> On Thu, Apr 05, 2018 at 02:58:06PM +0300, Kirill Tkhai wrote:
>>>> I observed the following deadlock between them:
>>>>
>>>> [task 1]                          [task 2]                         [task 3]
>>>> kill_fasync()                     mm_update_next_owner()           copy_process()
>>>>  spin_lock_irqsave(&fa->fa_lock)   read_lock(&tasklist_lock)        write_lock_irq(&tasklist_lock)
>>>>   send_sigio()                    <IRQ>                             ...
>>>>    read_lock(&fown->lock)         kill_fasync()                     ...
>>>>     read_lock(&tasklist_lock)      spin_lock_irqsave(&fa->fa_lock)  ...
>>>>
>>>> Task 1 can't acquire read locked tasklist_lock, since there is
>>>> already task 3 expressed its wish to take the lock exclusive.
>>>> Task 2 holds the read locked lock, but it can't take the spin lock.
>>>
>>> I think the important question is to Peter ... why didn't lockdep catch this?
>>>
>>>> -		spin_lock_irq(&fa->fa_lock);
>>>> +		write_lock_irq(&fa->fa_lock);
>>>>  		fa->fa_file = NULL;
>>>> -		spin_unlock_irq(&fa->fa_lock);
>>>> +		write_unlock_irq(&fa->fa_lock);
>>>
>>> ...
>>>> -		spin_lock_irq(&fa->fa_lock);
>>>> +		write_lock_irq(&fa->fa_lock);
>>>>  		fa->fa_fd = fd;
>>>> -		spin_unlock_irq(&fa->fa_lock);
>>>> +		write_unlock_irq(&fa->fa_lock);
>>>
>>> Do we really need a lock here?  If we convert each of these into WRITE_ONCE,
>>
>> We want to pass specific fd to send_sigio(), not a random one. Also, we do want
>> to dereference specific file in kill_fasync_rcu() without a danger it will be freed
>> in parallel. So, since there is no rcu_read_lock() or another protection in readers
>> of this data, we *can't* drop the lock.
>>
> 
> I think it's probably possible to do this with RCU.
> 
> You'd need to put the whole fasync structure under RCU, and we'd have
> to stop resetting fa_fd in fasync_insert_entry, and just insert a new
> one and delete the old when there was one like that. That'd be no big
> deal though since we always allocate one and pass it in anyway.
> 
> We could also turn the nasty singly-linked list into a standard hlist
> too, which would be nice.

I don't have objections it's possible, but does read lock there is real
performance problem we meet? Write lock is taken for a single operation,
so it does not seem to take much time. We used to leave with exclusive
spinlock for long time, and even with it nobody complained. If there had
been one, I assume, he/she would have submitted the patch to fix that...

Anyway, if we have to make this code completely lockless, we may. But
I would not take a risk to do this as a fix, which could be backported
to stable kernel. I agree with you, it should better be made as
a development work on top of fix.
 
> Regardless...for now we should probably take this patch and do any
> further work on the code from that point. I'll plan to pick up the
> patch for now unless Al or Bruce want it.

Ok, thanks for clarifying, Jeff.
 
>>> then 
>>>
>>> ...
>>>> -		spin_lock_irqsave(&fa->fa_lock, flags);
>>>> +		read_lock(&fa->fa_lock);
>>>>  		if (fa->fa_file) {
>>>
>>> file = READ_ONCE(fa->fa_file)
>>>
>>> then we're not opening any new races, are we?
>>>
>>>>  			fown = &fa->fa_file->f_owner;
>>>>  			/* Don't send SIGURG to processes which have not set a
>>>> @@ -997,7 +996,7 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
>>>>  			if (!(sig == SIGURG && fown->signum == 0))
>>>>  				send_sigio(fown, fa->fa_fd, band);
>>>>  		}
>>>> -		spin_unlock_irqrestore(&fa->fa_lock, flags);
>>>> +		read_unlock(&fa->fa_lock);
>>>>  		fa = rcu_dereference(fa->fa_next);
>>>>  	}
>>>>  }
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index c6baf767619e..297e2dcd9dd2 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -1250,7 +1250,7 @@ static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
>>>>  }
>>>>  
>>>>  struct fasync_struct {
>>>> -	spinlock_t		fa_lock;
>>>> +	rwlock_t		fa_lock;
>>>>  	int			magic;
>>>>  	int			fa_fd;
>>>>  	struct fasync_struct	*fa_next; /* singly linked list */
>>
>> Kirill

Kirill