lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1r5v0ogvw.fsf@fess.ebiederm.org>
Date:	Mon, 24 Aug 2009 17:51:15 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	David Dillow <dave@...dillows.org>
Cc:	Michael Riepe <michael.riepe@...glemail.com>,
	Michael Buesch <mb@...sch.de>,
	Francois Romieu <romieu@...zoreil.com>,
	Rui Santos <rsantos@...popie.com>,
	Michael Büker <m.bueker@...lin.de>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

David Dillow <dave@...dillows.org> writes:

> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>> ebiederm@...ssion.com (Eric W. Biederman) writes:
>> 
>> > David Dillow <dave@...dillows.org> writes:
>> >
>> >>
>> >> Re-looking at the code, I'd guess that some IRQ status line is getting
>> >> stuck high, but I don't see why -- we should acknowledge all outstanding
>> >> interrupts each time through the loop, whether we care about them or
>> >> not.
>> >>
>> >> Could reproduce a problem with the following patch applied, and send the
>> >> full dmesg, please?
>> >
>> > Here is what I get.
>> >
>> > r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>> 
>> And now that the machine has come out of it, that was followed by:
>> Looks like the soft lockup did not manage to trigger in this case.
>
> I need some more context, please. What is the network load through this
> NIC when you have the issues? Light, heavy? Can you give me more details
> about the machine? A full dmesg from boot until this happens would help
> quite a bit. At a minimum it would help answer which version of the chip
> we're dealing with and what the machine it is in looks like.
>
> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
> in MSI mode.
>
> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
> isn't jumping out at me.
>
> Also, I'm having connectivity troubles this weekend, so my response may
> be spotty. :(

When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
there is some bidirectional communication going on.

Do we really want to loop when those bits are set?

Perhaps we want to remove them from rtl_cfg_infos for the part?

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ