netdev - Re: bug: mutex_lock() in interrupt conntext via phy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080722075426.GA4302@pengutronix.de>
Date:	Tue, 22 Jul 2008 09:54:26 +0200
From:	Wolfram Sang <w.sang@...gutronix.de>
To:	Sebastian Siewior <netdev@...breakpoint.cc>
Cc:	Andy Fleming <afleming@...escale.com>,
	Nate Case <ncase@...-inc.com>, netdev@...r.kernel.org,
	linuxppc-dev@...abs.org, Vitaly Bordug <vbordug@...mvista.com>,
	Li Yang <leoli@...escale.com>
Subject: Re: bug: mutex_lock() in interrupt conntext via phy_stop() in
	gianfar

Hi,

On Fri, Jul 18, 2008 at 02:10:08PM +0200, Sebastian Siewior wrote:
> Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
> changed the phydev->lock from spinlock into a mutex. Now, the following
> code path got triggered while NFS was unavailable:
[...]
> I found out that the same code path may be trigger in
> - drivers/net/ucc_geth.c
> - drivers/net/fec_mpc52xx.c

Recently, I described a (I think) similar problem:
(http://ozlabs.org/pipermail/linuxppc-dev/2008-July/059686.html)

===

Hello,

today, I was debugging a kernel crash on a board with a MPC5200B using
2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
{
[...]
	/* on fifo error, soft-reset fec */
	if (ievent & (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {

		if (net_ratelimit() && (ievent & FEC_IEVENT_RFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_RFIFO_ERROR\n");
		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");

		mpc52xx_fec_reset(dev);

		netif_wake_queue(dev);
		return IRQ_HANDLED;
	}
[...]
}

Calling mpc52xx_fec_reset() from interrupt context is bad, at least
because

a) it calls phy_write, which contains BUG_ON(in_interrupt())
b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
   if the reset was successful (1..50 us)

I assume the proper thing to do is to set a flag in the ISR and handle
the soft reset later in some other context. Having never dealt with the
network core and its drivers so far, I am not sure which place would be
the right one to perform the soft reset. To not make things worse, I
hope people with more insight to network stuff can deliver a suitable
solution to this problem.

All the best,

   Wolfram

===

-- 
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)