netdev - Re: NMI lockup, 2.6.26 release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 13 Aug 2008 08:49:31 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Denys Fedoryshchenko <denys@...p.net.lb>
Cc:	netdev@...r.kernel.org
Subject: Re: NMI lockup, 2.6.26 release

On Wed, Aug 13, 2008 at 11:02:34AM +0300, Denys Fedoryshchenko wrote:
> As soon as kernel reboot themself, it won't hurt me much.
> With NMI watchdog i notice there was panic missing, so nmi_watchdog was 
> showing message and was not rebooting. It is fixed in next kernel and i patch 
> in my kernel - so i will not crash+freeze anymore i guess and will not need 
> to run to power switch at night.
> 
> It can be related to another problem (some corruption) which is not fixed yet, 
> so prefferably to show timer guys exact location of problem.
> 
> Maybe you can make some patch like:
> 
> +	if (q->next_watchdog < q->now || next_event <=
> +	     q->next_watchdog - PSCHED_TICKS_PER_SEC / (10 * HZ)) {
> +		qdisc_watchdog_schedule(&q->watchdog, next_event);
> +		q->next_watchdog = next_event;
> +	} else {
> something like BUG()
>          }
> ?

I don't think it's right: there could be probably some small time
differences between cpus on SMP or even some inaccuracy related to
hardware, but I don't think it's the right place or method to verify
this. And eg. re-scheduling with the same time shouldn't be wrong too.

Anyway, narrowing the problem with such tests should give us better
understanding what could be a real problem here. BTW, could you
"remind" us the .config on this box (especially various *HZ*, *TIME*
and *TIMERS* settings).

> Probably also i will try to migrate to "rc" versions of kernel to see if 
> problem still exist there, a lot of changes done there... is HTB corruption 
> problem tracked finally and completely? I seen some discussions about it 
> recently...

I doubt current rc versions are stable enough for any production. HTB
waits for one fix, but it's nothing critical if it didn't bothered you
until now. There could be still some problems around schedulers
generally, after last big changes.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html