lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1490614617.3393.4.camel@sipsolutions.net>
Date:   Mon, 27 Mar 2017 13:36:57 +0200
From:   Johannes Berg <johannes@...solutions.net>
To:     Nicolai Stange <nicstange@...il.com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        "Paul E.McKenney" <paulmck@...ux.vnet.ibm.com>,
        gregkh <gregkh@...uxfoundation.org>
Subject: Re: deadlock in synchronize_srcu() in debugfs?

Hi,

> > Before I go hunting - has anyone seen a deadlock in
> > synchronize_srcu() in debugfs_remove() before?
> 
> Not yet. How reproducible is this?

So ... this turned out to be a livelock of sorts.

We have a debugfs file (not upstream (yet?), it seems) that basically
blocks reading data.

At the point of system hanging, there was a process reading from that
file, with no data being generated.

A second process was trying to remove a completely unrelated debugfs
file (*), with the RTNL held.

A third and many other processes were waiting to acquire the RTNL.


Obviously, in light of things like nfp_net_debugfs_tx_q_read(),
wil_write_file_reset(), lowpan_short_addr_get() and quite a few more,
nobody in the whole system can now remove debugfs files while holding
the RTNL. Not sure how many people that affects, but it's IMHO a pretty
major new restriction, and one that isn't even flagged at all.


Similarly, nobody should be blocking in debugfs files, like we did in
ours, but also smsdvb_stats_read(), crtc_crc_open() look like they
could block for quite a while. Again, there's no warning here that
blocking in debugfs files can now indefinitely defer completely
unrelated debugfs_remove() calls in the entire system.

Overall, while I can solve this problem for our driver, possibly by
making the debugfs file return some dummy data periodically if no real
data exists, which may not easily be possible for all such files, I'm
not convinced that all of this really is the right thing to actually
impose. Perhaps if it was per directory, or per some kind of subsystem?

johannes

(*) before removing first first we'd obviously wake up and thereby more
or less terminate the readers first

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ