[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1287851745.2658.364.camel@edumazet-laptop>
Date: Sat, 23 Oct 2010 18:35:45 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Joe Buehler <aspam@....net>
Cc: netdev@...r.kernel.org
Subject: Re: kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]
Le samedi 23 octobre 2010 à 11:33 -0400, Joe Buehler a écrit :
> It is always possible that there is some issue with the Octeon memory
> barrier stuff, but I would think that the system would be much more
> unstable than it is -- we're really beating on a dual CPU LINUX instance
> that has Java and C++ apps running and also doing some network I/O.
>
> My strategy at this point is logging events to memory and dumping the
> log to the console at the time of the panic. I might be able to figure
> out the sequence of events causing the crash.
>
> The load test that causes the panic is using several dozen TAP
> interfaces, ifconfig'd up/down every 10 seconds or so, with
> source-routes, DNAT and SNAT being set up and taken down also.
With a normal workload, on a dual cpu machine, a missing memory barrier
can stay un-noticed for quite a long time. The race window is so small
that probability for the bug might be 0.0000001 % or something like
that :(
You could try to run a test dual threaded program to reproduce the
problem in user land, faster...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists