netdev - Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1iqggr8yt.fsf@fess.ebiederm.org>
Date:	Fri, 21 Aug 2009 17:24:58 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	David Dillow <dave@...dillows.org>
Cc:	Michael Riepe <michael.riepe@...glemail.com>,
	Michael Buesch <mb@...sch.de>,
	Francois Romieu <romieu@...zoreil.com>,
	Rui Santos <rsantos@...popie.com>,
	Michael Büker <m.bueker@...lin.de>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

David Dillow <dave@...dillows.org> writes:

> On Fri, 2009-08-21 at 18:59 -0400, David Dillow wrote:
>> On Fri, 2009-08-21 at 13:57 -0700, Eric W. Biederman wrote:
>> > David Dillow <dave@...dillows.org> writes:
>> > I have what at first glance looks like a problem caused by this
>> > patch.  For the last month since upgrading one of my machines from
>> > 2.6.28 to 2.6.30 it has been becomming inaccessible from the
>> > network and I have a few:
>> > 
>> > NETDEV WATCHDOG: eth0 (r8169): transmit timed out
>> > 
>> > in my logs and a lot soft lockups that always have rtl8169_interrupt
>> > as the thing that is running.   I suspect your patch has introduced
>> > a near infinite loop in the interrupt handler and is causing these
>> > soft lockups.
>> > 
>> > Any ideas?
>> 
>> I would be surprised, but I suppose it is not out of the realm of
>> possibility. Can you send me a full dmesg, please?
>
> Re-looking at the code, I'd guess that some IRQ status line is getting
> stuck high, but I don't see why -- we should acknowledge all outstanding
> interrupts each time through the loop, whether we care about them or
> not.
>
> Could reproduce a problem with the following patch applied, and send the
> full dmesg, please?

Will do.  This looks like a good way to test my hypothesis thanks.
I can't quite reproduce this problem so it may be a few days before
I know.

Eric


> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index b82780d..46cb05a 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	void __iomem *ioaddr = tp->mmio_addr;
>  	int handled = 0;
>  	int status;
> +	int count = 0;
>  
>  	/* loop handling interrupts until we have no new ones or
>  	 * we hit a invalid/hotplug case.
> @@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	while (status && status != 0xffff) {
>  		handled = 1;
>  
> +		if (count++ > 100) {
> +			printk_once("r8169 screaming irq status %08x "
> +				"mask %08x event %08x napi %08x\n",
> +				status, tp->intr_mask, tp->intr_event,
> +				tp->napi_event);
> +			break;
> +		}
> +
> +
>  		/* Handle all of the error cases first. These will reset
>  		 * the chip, so just exit the loop.
>  		 */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html