lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210816144622.tgslast6sbblclda@skbuf>
Date:   Mon, 16 Aug 2021 17:46:22 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Frank Rowand <frowand.list@...il.com>
Cc:     Rob Herring <robh+dt@...nel.org>,
        Sascha Hauer <s.hauer@...gutronix.de>,
        devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: of_node_put() usage is buggy all over drivers/of/base.c?!

Hi Frank,

On Mon, Aug 16, 2021 at 09:33:03AM -0500, Frank Rowand wrote:
> Hi Vladimir,
> 
> On 8/13/21 8:01 PM, Vladimir Oltean wrote:
> > Hi,
> > 
> > I was debugging an RCU stall which happened during the probing of a
> > driver. Activating lock debugging, I see:
> 
> I took a quick look at sja1105_mdiobus_register() in v5.14-rc1 and v5.14-rc6.
> 
> Looking at the following stack trace, I did not see any calls to
> of_find_compatible_node() in sja1105_mdiobus_register().  I am
> guessing that maybe there is an inlined function that calls
> of_find_compatible_node().  This would likely be either
> sja1105_mdiobus_base_tx_register() or sja1105_mdioux_base_t1_register().

Yes, it is sja1105_mdiobus_base_t1_register which is inlined.

> > 
> > [  101.710694] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
> > [  101.719119] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1534, name: sh
> > [  101.726763] INFO: lockdep is turned off.
> > [  101.730674] irq event stamp: 0
> > [  101.733716] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > [  101.739973] hardirqs last disabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
> > [  101.748146] softirqs last  enabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
> > [  101.756313] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [  101.762569] CPU: 4 PID: 1534 Comm: sh Not tainted 5.14.0-rc5+ #272
> > [  101.774558] Call trace:
> > [  101.794734]  __might_sleep+0x50/0x88
> > [  101.798297]  __mutex_lock+0x60/0x938
> > [  101.801863]  mutex_lock_nested+0x38/0x50
> > [  101.805775]  kernfs_remove+0x2c/0x50             <---- this takes mutex_lock(&kernfs_mutex);
> > [  101.809341]  sysfs_remove_dir+0x54/0x70
> 
> The __kobject_del() occurs only if the refcount on the node
> becomes zero.  This should never be true when of_find_compatible_node()
> calls of_node_put() unless a "from" node is passed to of_find_compatible_node().

I figured that was the assumption, that the of_node_put would never
trigger a sysfs file / kobject deletion from there.

> In both sja1105_mdiobus_base_tx_register() and sja1105_mdioux_base_t1_register()
> a from node ("mdio") is passed to of_find_compatible_node() without first doing an
> of_node_get(mdio).  If you add the of_node_get() calls the problem should be fixed.

The answer seems simple enough, but stupid question, but why does
of_find_compatible_node call of_node_put on "from" in the first place?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ