[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1r5v0ogvw.fsf@fess.ebiederm.org>
Date: Mon, 24 Aug 2009 17:51:15 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: David Dillow <dave@...dillows.org>
Cc: Michael Riepe <michael.riepe@...glemail.com>,
Michael Buesch <mb@...sch.de>,
Francois Romieu <romieu@...zoreil.com>,
Rui Santos <rsantos@...popie.com>,
Michael Büker <m.bueker@...lin.de>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
David Dillow <dave@...dillows.org> writes:
> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>> ebiederm@...ssion.com (Eric W. Biederman) writes:
>>
>> > David Dillow <dave@...dillows.org> writes:
>> >
>> >>
>> >> Re-looking at the code, I'd guess that some IRQ status line is getting
>> >> stuck high, but I don't see why -- we should acknowledge all outstanding
>> >> interrupts each time through the loop, whether we care about them or
>> >> not.
>> >>
>> >> Could reproduce a problem with the following patch applied, and send the
>> >> full dmesg, please?
>> >
>> > Here is what I get.
>> >
>> > r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>>
>> And now that the machine has come out of it, that was followed by:
>> Looks like the soft lockup did not manage to trigger in this case.
>
> I need some more context, please. What is the network load through this
> NIC when you have the issues? Light, heavy? Can you give me more details
> about the machine? A full dmesg from boot until this happens would help
> quite a bit. At a minimum it would help answer which version of the chip
> we're dealing with and what the machine it is in looks like.
>
> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
> in MSI mode.
>
> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
> isn't jumping out at me.
>
> Also, I'm having connectivity troubles this weekend, so my response may
> be spotty. :(
When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
there is some bidirectional communication going on.
Do we really want to loop when those bits are set?
Perhaps we want to remove them from rtl_cfg_infos for the part?
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists