lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 24 Feb 2016 23:53:47 +0100
From:	Ian Kumlien <ian.kumlien@...il.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc:	sasha.levin@...cle.com
Subject: Re: [BUG][4.5-rc5] rcu_shed self-detected stall on CPU - directly
 after network goes online.

On 22 February 2016 at 01:38, Ian Kumlien <ian.kumlien@...il.com> wrote:
> Hi,
>
> When i tried to upgrade my, soon to be, firewall to 4.5-rc5 to do some
> testing - it deadlocked almost instantly.

After bisect, the offending patch seems to be:
b16c29191dc89bd877af99a7b04ce4866728a3e0

It looks like some basic sanity checking went missing...

The original patch does:
diff --git a/net/netfilter/nfnetlink_cttimeout.c
b/net/netfilter/nfnetlink_cttimeout.c
index 5d010f2..94837d2 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -307,12 +307,12 @@ static void ctnl_untimeout(struct net *net,
struct ctnl_timeout *timeout)

        local_bh_disable();
        for (i = 0; i < net->ct.htable_size; i++) {
-               spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+               nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
                if (i < net->ct.htable_size) {
                        hlist_nulls_for_each_entry(h, nn,
&net->ct.hash[i], hnnode)
                                untimeout(h, timeout);
                }
-               spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+               nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
        }
        local_bh_enable();
 }
---

Which looks like a mistake - the fix should be:
diff --git a/net/netfilter/nfnetlink_cttimeout.c
b/net/netfilter/nfnetlink_cttimeout.c
index 94837d2..2671b9d 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -312,7 +312,7 @@ static void ctnl_untimeout(struct net *net, struct
ctnl_timeout *timeout)
                        hlist_nulls_for_each_entry(h, nn,
&net->ct.hash[i], hnnode)
                                untimeout(h, timeout);
                }
-               nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+               spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
        }
        local_bh_enable();
 }
---

And it fixes my issue! ;)

> In the photo, i started writing "root" and it keeps repeating it, like
> it's in a while loop.
>
> https://goo.gl/photos/yGhNSogJjeb2VJyu5
>
> Trying to get better information - as in any - i enabled quite a few
> debugging options that could have any bearing on it and ended up with:
> https://goo.gl/photos/NnQER2WXXJ5ZWPR67
>
> The interesting part is that in this case the machine was booted in to
> single user mode and did not crash.
>
> It seems like it gets in to trouble when the bridges and the network
> interfaces are enabled, as in just about a second or two after boot.

[--8<--]

View attachment "0001-netfilter-nf_conntrack-lock-error.patch" of type "text/x-patch" (977 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ