lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170418151700.GU3956@linux.vnet.ibm.com>
Date:   Tue, 18 Apr 2017 08:17:00 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Johannes Berg <johannes@...solutions.net>
Cc:     Nicolai Stange <nicstange@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances

On Tue, Apr 18, 2017 at 03:40:32PM +0200, Johannes Berg wrote:
> On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote:
> > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote:
> > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote:
> > > 
> > > > If you have not already done so, please run this with debug
> > > > enabled,
> > > > especially CONFIG_PROVE_LOCKING=y (which implies
> > > > CONFIG_PROVE_RCU=y).
> > > > This is important because there are configurations for which the
> > > > deadlocks you saw with SRCU turn into silent failure, including
> > > > memory corruption.
> > > > CONFIG_PROVE_RCU=y will catch many of those situations.
> > > 
> > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU
> > > enabled in the builds where we saw the problem, but I'm not sure.
> > 
> > CONFIG_PROVE_RCU=y will reliably catch things like this:
> > 
> > 1.	rcu_read_lock();
> > 	synchronize_rcu();
> > 	rcu_read_unlock();
> 
> Ok, that's not something that happens here either.
> 
> > 2.	rcu_read_lock();
> > 	schedule_timeout_interruptible(HZ);
> > 	rcu_read_unlock();
> 
> Neither is this happening.
> 
> > There are more, but this should get you the flavor of the types
> > of bugs CONFIG_PROVE_RCU=y can locate for you.
> 
> Makes sense. However, the issue at hand is what we (you and I)
> discussed earlier wrt. lockdep -- from SRCU's point of view everything
> is actually OK, except that the one thread is waiting for something and
> we can never finish the grace period, and thus synchronize_srcu() will
> never return. But there's no real SRCU bug here.
> 
> > > Nicolai probably never even ran into this problem, though it should
> > > be easy to reproduce.
> > 
> > I am just worried that the situation resulting in the earlier SRCU
> > deadlocks might be hiding behind CONFIG_PROVE_RCU=n,
> > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n.  Or some other bug
> > hiding behind some other set of Kconfig options.
> 
> There's no SRCU deadlock though. I know exactly why it happens, in my
> case, which is the following:
> 
> Thread 1
> userspace: read(debugfs_file_1)
> srcu_read_lock(&debugfs_srcu); // in debugfs bowels
> wait_event_interruptible(...); // in my driver's debugfs read method
> 
> Thread 2:
> debugfs_remove(debugfs_file_2);
> srcu_synchronize(&debugfs_srcu); // in debugfs bowels
> 
> 
> This is the live-lock. The deadlock is something I posited but never
> ran into:
> 
> CPU 1				CPU 2
> srcu_read_lock(&debugfs_srcu);
> 				rtnl_lock();
> rtnl_lock();
> 				srcu_synchronize(&debugfs_srcu);
> 
> Again, no (S)RCU abuse here, just an ABBA deadlock.

OK, please accept my apologies for failing to follow the thread.

I nevertheless reiterate my advice to run at least some tests with
CONFIG_PROVE_RCU=y.  And yes, it would be good to upgrade lockdep
to find the above theoretical deadlock.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ