lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 23 Aug 2009 19:43:52 +0200
From:	Michal Soltys <soltys@....info>
To:	Jarek Poplawski <jarkao2@...il.com>
Cc:	David Dillow <dave@...dillows.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Michael Riepe <michael.riepe@...glemail.com>,
	Michael Buesch <mb@...sch.de>,
	Francois Romieu <romieu@...zoreil.com>,
	Rui Santos <rsantos@...popie.com>,
	Michael Büker <m.bueker@...lin.de>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

Jarek Poplawski wrote:
> David Dillow wrote, On 08/22/2009 10:43 PM:
> 
>> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>>> ebiederm@...ssion.com (Eric W. Biederman) writes:
>>>
>>>> David Dillow <dave@...dillows.org> writes:
>>>>
>>>>> Re-looking at the code, I'd guess that some IRQ status line is getting
>>>>> stuck high, but I don't see why -- we should acknowledge all outstanding
>>>>> interrupts each time through the loop, whether we care about them or
>>>>> not.
>>>>>
>>>>> Could reproduce a problem with the following patch applied, and send the
>>>>> full dmesg, please?
>>>> Here is what I get.
>>>>
>>>> r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>>> And now that the machine has come out of it, that was followed by:
>>> Looks like the soft lockup did not manage to trigger in this case.
>> 
>> I need some more context, please. What is the network load through this
>> NIC when you have the issues? Light, heavy? Can you give me more details
>> about the machine? A full dmesg from boot until this happens would help
>> quite a bit. At a minimum it would help answer which version of the chip
>> we're dealing with and what the machine it is in looks like.
>> 
>> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
>> in MSI mode.
>> 
>> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
>> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
>> isn't jumping out at me.
>> 
>> Also, I'm having connectivity troubles this weekend, so my response may
>> be spotty. :(
>> 
> 
> 
> BTW, FYI, it seems Michal stopped tracking this problem, but he
> found this commit problematic as well.
> 
> From: Michal Soltys <soltys@....info>
> Subject: Re: r8169 (+others ?) and note_interrupt performance hit on 2.6.30.x
> Date: Wed, 05 Aug 2009 20:54:47 +0200
> http://marc.info/?l=linux-netdev&m=124949848110710&w=2
> 

Well - not really stopped, but not sure what to look at before that 
particular commit (as cpu load for the tests I've done increased rather 
significantly as well before that, and after 2.6.29 - but it doesn't 
seem to be related to the driver). And I was away for over a week...

As fot the changes that commit introduced, here's is link to the mail 
with the oprofile I did back then:

http://www.spinics.net/lists/netdev/msg102709.html

I'm happy to assist any way I can.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ