netdev - Re: kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1287863065.2658.533.camel@edumazet-laptop>
Date:	Sat, 23 Oct 2010 21:44:25 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	"\"Oleg A. Arkhangelsky\"" <sysoleg@...dex.ru>,
	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org, Joe Buehler <aspam@....net>
Subject: Re: kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]

Le samedi 23 octobre 2010 à 21:37 +0400, "Oleg A. Arkhangelsky" a
écrit :
> 23.10.2010, 20:36, "Eric Dumazet" <eric.dumazet@...il.com>:
> 
> > With a normal workload, on a dual cpu machine, a missing memory barrier
> > can stay un-noticed for quite a long time. The race window is so small
> > that probability for the bug might be 0.0000001 % or something like
> > that :(
> 
> Eric, I'd like to remind you that I've faced the similar problem on simple x86.
> 
> See http://kerneltrap.org/mailarchive/linux-netdev/2010/3/9/6271568
> 
> Two main differences for our case:
> 
> 1) There is no userspace workload (except for bgpd), no changes in interfaces
> 2) We are not using multiple routing tables
> 
> This panic was pretty rare in our case  (not more that 2 times per month).
> 
> Currently we're running fine with disabled CONFIG_IP_MULTIPLE_TABLES.
> 

Okay ;)

I believe I found a bug, but really cant understand how it can triggers
on your workload (and Joe one, of course)

Here is a patch against net-next-2.6 for testing, it probably can
backported to old kernels.

Thanks

[PATCH] fib: fix fib_nl_newrule()

Some panic reports in fib_rules_lookup() show a rule could have a NULL
pointer as a next pointer in the rules_list.

This can actually happen because of a bug in fib_nl_newrule() : It
checks if current rule is the destination of unresolved gotos. (Other
rules have gotos to this about to be inserted rule)

Problem is it does the resolution of the gotos before the rule is
inserted in the rules_list (and has a valid next pointer)

Fix this by moving the rules_list insertion before the changes on gotos.

A lockless reader can not any more follow a ctarget pointer, unless
destination is ready (has a valid next pointer)

Reported-by: Oleg A. Arkhangelsky <sysoleg@...dex.ru>
Reported-by: Joe Buehler <aspam@....net>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
---
 net/core/fib_rules.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 1bc3f25..12b43cc 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -373,6 +373,11 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 
 	fib_rule_get(rule);
 
+	if (last)
+		list_add_rcu(&rule->list, &last->list);
+	else
+		list_add_rcu(&rule->list, &ops->rules_list);
+
 	if (ops->unresolved_rules) {
 		/*
 		 * There are unresolved goto rules in the list, check if
@@ -395,11 +400,6 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	if (unresolved)
 		ops->unresolved_rules++;
 
-	if (last)
-		list_add_rcu(&rule->list, &last->list);
-	else
-		list_add_rcu(&rule->list, &ops->rules_list);
-
 	notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).pid);
 	flush_route_cache(ops);
 	rules_ops_put(ops);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html