lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Wed, 31 Jan 2007 01:39:26 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	<luigi.genoni@...elli.com>
Cc:	<akpm@...l.org>, <linux-kernel@...r.kernel.org>
Subject: Re: System crash after "No irq handler for vector" linux 2.6.19

<luigi.genoni@...elli.com> writes:

> I have in interesting update, at less I suppose I have.

It was, at least as another data point.

> I do not know very well what happens with irq stuff migrating shared irq, but I
> suppose this has something to do with this crash.

The fact the irq was shared should have no bearing on this crash scenario.
A shared irq is not at all helpful in a performance sense, but this problem
is low enough that a shared irq should have made not difference at all, except
for the frequency the interrupt fired, and was migrated.  And a high interrupt
frequency, and a high migration rate tend to cause this problem.

> Right now I stopped irqbalance and puff! load average is back to normal, and
> under the same workload notthing similar is happening for the moment.

Yes.  That sounds like a good work around until this problem is sorted out.

> Lesson number one I learnt: avoid shared IRQ on this systems (but to reconfigure
> HW cabling right now is not so easy).

Right.   Because the only sharing should be because the traces on the
motherboard are shared.

> I hope this helps

It has all helped.

I have been tracking some easier problems, keeping this one on my back burner.
The good/bad news is that by restricting my set of vectors I can choose from
in the kernel, and running ping -f to another machine.  And migrating the
single irq for my NIC.  I have been able to reproduce this in about 5 minutes.

I haven't root caused it yet, but the fact I can reproduce this on dual socket
motherboard suggest the reason it took you an hour to reproduce the problem is
simply because you had so few irqs on your system, and that the extreme
latency of 8-socket Opterons is not to blame.

Hopefully now that I can reproduce this I will be able to root cause
this and then fix this bug.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ