lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 4 May 2021 17:19:46 +0300
From:   Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
To:     Frieder Schrempf <frieder.schrempf@...tron.de>
Cc:     Timo Schlüßler <schluessler@...use.de>,
        Marc Kleine-Budde <mkl@...gutronix.de>,
        linux-can@...r.kernel.org, Wolfgang Grandegger <wg@...ndegger.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Liam Girdwood <lgirdwood@...il.com>,
        Mark Brown <broonie@...nel.org>,
        Vincent Mailhol <mailhol.vincent@...adoo.fr>,
        Oliver Hartkopp <socketcan@...tkopp.net>,
        Tim Harvey <tharvey@...eworks.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Null pointer dereference in mcp251x driver when resuming from
 sleep

On Tue, May 04, 2021 at 03:54:00PM +0200, Frieder Schrempf wrote:
> On 03.05.21 15:54, Andy Shevchenko wrote:
> > On Mon, May 03, 2021 at 04:48:10PM +0300, Andy Shevchenko wrote:
> > > On Mon, May 03, 2021 at 04:44:24PM +0300, Andy Shevchenko wrote:
> > > > On Mon, May 03, 2021 at 03:11:40PM +0200, Frieder Schrempf wrote:
> > > > > Hi,
> > > > > 
> > > > > with kernel 5.10.x and 5.12.x I'm getting a null pointer dereference
> > > > > exception from the mcp251x driver when I resume from sleep (see trace
> > > > > below).
> > > > > 
> > > > > As far as I can tell this was working fine with 5.4. As I currently don't
> > > > > have the time to do further debugging/bisecting, for now I want to at least
> > > > > report this here.
> > > > > 
> > > > > Maybe there is someone around who could already give a wild guess for what
> > > > > might cause this just by looking at the trace/code!?
> > > > 
> > > > Does revert of c7299fea6769 ("spi: Fix spi device unregister flow") help?
> > > 
> > > Other than that, bisecting will take not more than 3-4 iterations only:
> > > % git log --oneline v5.4..v5.10.34 -- drivers/net/can/spi/mcp251x.c
> > > 3292c4fc9ce2 can: mcp251x: fix support for half duplex SPI host controllers
> > > e0e25001d088 can: mcp251x: add support for half duplex controllers
> > > 74fa565b63dc can: mcp251x: Use readx_poll_timeout() helper
> > > 2d52dabbef60 can: mcp251x: add GPIO support
> > > cfc24a0aa7a1 can: mcp251x: sort include files alphabetically
> > > df561f6688fe treewide: Use fallthrough pseudo-keyword
> > 
> > > 8ce8c0abcba3 can: mcp251x: only reset hardware as required
> > 
> > And only smoking gun by analyzing the code is the above. So, for the first I
> > would simply check before that commit and immediately after (15-30 minutes of
> > work). (I would do it myself if I had a hardware at hand...)
> 
> Thanks for pointing that out. Indeed when I revert this commit it works fine
> again.
> 
> When I look at the change I see that queue_work(priv->wq,
> &priv->restart_work) is called in two cases, when the interface is brought
> up after resume and now also when the device is only powered up after resume
> but the interface stays down.
> 
> The latter is a problem if the device was never brought up before, as the
> workqueue is only allocated and initialized in mcp251x_open().
> 
> To me it looks like a proper fix would be to just move the workqueue init to
> the probe function to make sure it is available when resuming even if the
> interface was never up before.
> 
> I will try this and send a patch if it looks good.

Sounds like a plan!

-- 
With Best Regards,
Andy Shevchenko


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ