lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <201201181211.45011.tim.sander@hbm.com>
Date:	Wed, 18 Jan 2012 12:11:44 +0100
From:	"Tim Sander" <tim.sander@....com>
To:	"Mike Galbraith" <efault@....de>
Cc:	"Tim Sander" <tstone@....tu-darmstadt.de>,
	"Steven Rostedt" <rostedt@...dmis.org>,
	"LKML" <linux-kernel@...r.kernel.org>,
	"RT" <linux-rt-users@...r.kernel.org>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Clark Williams" <williams@...hat.com>,
	"John Kacur" <jkacur@...hat.com>
Subject: Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep

Hi Mike and others

Thanks for your reply Mike.

Am Dienstag, 17. Januar 2012, 18:40:11 schrieb Mike Galbraith:
> I have a patchlet lying about that will show the likely culprit, but if
> ksoftirqd is eating CPU, someone has to raising softirqs at a frightful
> rate, and the culprit it shows would almost certainly be ksoftirqd.  I
> mean, what else is running during boot that is RT other than kernel
> threads.  Nada.
Well thanks for your patch. It didn't apply cleanly due to some moved lines, 
but nothing to serious. I now have a machine where top just shows me the
culprit:
sirq-net-tx/0 

It seems to be triggered not as often as the mainline rt kernel though. But
after some starts and stops of "connmand" and "ifconfig eth0 down" i got back
this errornous behaviour. The only question is what next? Still i have some 
more observations which might help to nail down this bug:
* ifconfig does not return when sirq-net-tx/0 eats all cpu
* sometimes sirq-net-tx/0 sits on the cpu for a couple of seconds and goes 
away, somtimes it just stays there when "ifconfig eth0 up" is issued.
* There are suspicious "FEC: MDIO read timeout" kernel log messages from the 
ethernet driver.
* The ethernet phy uses polling since i do not know how to set the phy irq in 
the board definition. I tried using "phy_register_fixup_for_uid" and then 
setting the phy_dev->irq int the fixup routine but that seems to be to late and 
the interrupt is deregisterd but has not been registered when the network 
device is shut down. 
I also didn't found a example in the source and there has been no word in the 
phy.txt documentation about it? So input on how to set the phy irq in the 
board config of the pcm043 would be really nice.

> You can find out easy easy enough, just edit kernel/softirq.c, comment
> out ksoftirqd_set_sched_params() in run_ksoftirqd().  If the throttle
> doesn't kick in (because ksoftirqd is now not RT), box boots but
> ksoftirqd still chewing up a CPU, you have the same info the throttle
> hacklet would show.
> 
> If that's it, you can apply the below, do the same edit, and see which
> thread is grinding away.  From there, I'd set a trap.  Let sirq threads
> detect that they are being awakened too fast (hey, I can't go to sleep,
> the sirq I just processed is busy again, N times in a row) and leave a
> note for wakeup_softirqd().  There, WARN_ON(ksoftirqd)[i].help_me) or
> such, to see who is flogging which softirq mercilessly.
I didn't use this tricks, since top was already doing its job good enough :-).

Best regards
Tim

Please ignore:


Hottinger Baldwin Messtechnik GmbH, Im Tiefen See 45, 64293 Darmstadt, Germany | www.hbm.com 

Registered as GmbH (German limited liability corporation) in the commercial register at the local court of Darmstadt, HRB 1147  
Company domiciled in Darmstadt | CEO: Andreas Huellhorst | Chairman of the board: James Charles Webster

Als Gesellschaft mit beschraenkter Haftung eingetragen im Handelsregister des Amtsgerichts Darmstadt unter HRB 1147 
Sitz der Gesellschaft: Darmstadt | Geschaeftsfuehrung: Andreas Huellhorst | Aufsichtsratsvorsitzender: James Charles Webster

The information in this email is confidential. It is intended solely for the addressee. If you are not the intended recipient, please let me know and delete this email.

Die in dieser E-Mail enthaltene Information ist vertraulich und lediglich für den Empfaenger bestimmt. Sollten Sie nicht der eigentliche Empfaenger sein, informieren Sie mich bitte kurz und loeschen diese E-Mail.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ