linux-kernel - Re: deadlock in synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1490347486.2766.17.camel@sipsolutions.net>
Date:   Fri, 24 Mar 2017 10:24:46 +0100
From:   Johannes Berg <johannes@...solutions.net>
To:     linux-kernel <linux-kernel@...r.kernel.org>
Cc:     Nicolai Stange <nicstange@...il.com>,
        "Paul E.McKenney" <paulmck@...ux.vnet.ibm.com>,
        gregkh <gregkh@...uxfoundation.org>, sharon.dvir@...el.com,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-wireless <linux-wireless@...r.kernel.org>
Subject: Re: deadlock in synchronize_srcu() in debugfs?

Hi,

On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > Isn't it possible for the following to happen?
> > 
> > CPU1					CPU2
> > 
> > mutex_lock(&M); // acquires mutex
> > 					full_proxy_xyz();
> > 					srcu_read_lock(&debugfs_srcu);
> > 					real_fops->xyz();
> > 					mutex_lock(&M); // waiting for mutex
> > debugfs_remove(F);
> > synchronize_srcu(&debugfs_srcu);

> So I'm pretty sure that this can happen. I'm not convinced that it's
> happening here, but still.

I'm a bit confused, in that SRCU, of course, doesn't wait until all the
readers are done - that'd be a regular reader/writer lock or something.

However, it does (have to) wait until all the currently active read-
side sections have terminated, which still leads to a deadlock in the
example above, I think?

In his 2006 LWN article Paul wrote:

    The designer of a given subsystem is responsible for: (1) ensuring
    that SRCU read-side sleeping is bounded and (2) limiting the amount
    of memory waiting for synchronize_srcu(). [1]

In the case of debugfs files acquiring locks, (1) can't really be
guaranteed, especially if those locks can be held while doing
synchronize_srcu() [via debugfs_remove], so I still think the lockdep
annotation needs to be changed to at least have some annotation at
synchronize_srcu() time so we can detect this.

Now, I still suspect there's some other bug here in the case that I'm
seeing, because I don't actually see the "mutex_lock(&M); // waiting"
piece in the traces. I'll need to run this with some tracing on Monday
when the test guys are back from the weekend.

I'm also not sure how I can possibly fix this in debugfs in mac80211
and friends, but that's perhaps a different story. Clearly, this
debugfs patch is a good thing - the code will likely have had use-
after-free problems in this situation without it. But flagging the
potential deadlocks would make it a lot easier to find them.

johannes