[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CC30055.5040509@cox.net>
Date: Sat, 23 Oct 2010 11:33:41 -0400
From: Joe Buehler <aspam@....net>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: netdev@...r.kernel.org
Subject: Re: kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]
Eric Dumazet wrote:
>
> Did that... Hmm...
>
> I am wondering if smp_rcu_assign_pointer() (or more precisely smp_wmb())
> is correctly implemented on octeon platform.
>
> Try to add in fib_nl_newrule() right after the kzalloc bloc :
>
> rule = kzalloc(ops->rule_size, GFP_KERNEL);
> if (rule == NULL) {
> err = -ENOMEM;
> goto errout;
> }
> + rule->list.next = LIST_POISON1;
> + rule->list.prev = LIST_POISON2;
>
>
> So that we can actually see if the NULL dereference bug you hit becomes
> a "LIST_POISON1" dereference bug...
>
>
>
Thanks -- I'll try it when I'm back in the office Tuesday.
It is always possible that there is some issue with the Octeon memory
barrier stuff, but I would think that the system would be much more
unstable than it is -- we're really beating on a dual CPU LINUX instance
that has Java and C++ apps running and also doing some network I/O.
My strategy at this point is logging events to memory and dumping the
log to the console at the time of the panic. I might be able to figure
out the sequence of events causing the crash.
The load test that causes the panic is using several dozen TAP
interfaces, ifconfig'd up/down every 10 seconds or so, with
source-routes, DNAT and SNAT being set up and taken down also.
Joe Buehler
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists