[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAA85sZtt+u3nA2N-3OwrOB-o-A7gPdpMKHG74UwZpou2zaE59g@mail.gmail.com>
Date: Wed, 24 Feb 2016 23:53:47 +0100
From: Ian Kumlien <ian.kumlien@...il.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc: sasha.levin@...cle.com
Subject: Re: [BUG][4.5-rc5] rcu_shed self-detected stall on CPU - directly
after network goes online.
On 22 February 2016 at 01:38, Ian Kumlien <ian.kumlien@...il.com> wrote:
> Hi,
>
> When i tried to upgrade my, soon to be, firewall to 4.5-rc5 to do some
> testing - it deadlocked almost instantly.
After bisect, the offending patch seems to be:
b16c29191dc89bd877af99a7b04ce4866728a3e0
It looks like some basic sanity checking went missing...
The original patch does:
diff --git a/net/netfilter/nfnetlink_cttimeout.c
b/net/netfilter/nfnetlink_cttimeout.c
index 5d010f2..94837d2 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -307,12 +307,12 @@ static void ctnl_untimeout(struct net *net,
struct ctnl_timeout *timeout)
local_bh_disable();
for (i = 0; i < net->ct.htable_size; i++) {
- spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
if (i < net->ct.htable_size) {
hlist_nulls_for_each_entry(h, nn,
&net->ct.hash[i], hnnode)
untimeout(h, timeout);
}
- spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
}
local_bh_enable();
}
---
Which looks like a mistake - the fix should be:
diff --git a/net/netfilter/nfnetlink_cttimeout.c
b/net/netfilter/nfnetlink_cttimeout.c
index 94837d2..2671b9d 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -312,7 +312,7 @@ static void ctnl_untimeout(struct net *net, struct
ctnl_timeout *timeout)
hlist_nulls_for_each_entry(h, nn,
&net->ct.hash[i], hnnode)
untimeout(h, timeout);
}
- nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
}
local_bh_enable();
}
---
And it fixes my issue! ;)
> In the photo, i started writing "root" and it keeps repeating it, like
> it's in a while loop.
>
> https://goo.gl/photos/yGhNSogJjeb2VJyu5
>
> Trying to get better information - as in any - i enabled quite a few
> debugging options that could have any bearing on it and ended up with:
> https://goo.gl/photos/NnQER2WXXJ5ZWPR67
>
> The interesting part is that in this case the machine was booted in to
> single user mode and did not crash.
>
> It seems like it gets in to trouble when the bridges and the network
> interfaces are enabled, as in just about a second or two after boot.
[--8<--]
View attachment "0001-netfilter-nf_conntrack-lock-error.patch" of type "text/x-patch" (977 bytes)
Powered by blists - more mailing lists