linux-kernel - Re: [Regression] stress-ng udp-flood causes kernel panic on Ampere Altra

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220702205651.GB15144@breakpoint.cc>
Date:   Sat, 2 Jul 2022 22:56:51 +0200
From:   Florian Westphal <fw@...len.de>
To:     Kajetan Puchalski <kajetan.puchalski@....com>
Cc:     Florian Westphal <fw@...len.de>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jozsef Kadlecsik <kadlec@...filter.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Mel Gorman <mgorman@...e.de>,
        lukasz.luba@....com, dietmar.eggemann@....com,
        netfilter-devel@...r.kernel.org, coreteam@...filter.org,
        netdev@...r.kernel.org, stable@...r.kernel.org,
        regressions@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [Regression] stress-ng udp-flood causes kernel panic on Ampere
 Altra

Kajetan Puchalski <kajetan.puchalski@....com> wrote:
> On Fri, Jul 01, 2022 at 10:01:10PM +0200, Florian Westphal wrote:
> > Kajetan Puchalski <kajetan.puchalski@....com> wrote:
> > > While running the udp-flood test from stress-ng on Ampere Altra (Mt.
> > > Jade platform) I encountered a kernel panic caused by NULL pointer
> > > dereference within nf_conntrack.
> > > 
> > > The issue is present in the latest mainline (5.19-rc4), latest stable
> > > (5.18.8), as well as multiple older stable versions. The last working
> > > stable version I found was 5.15.40.
> > 
> > Do I need a special setup for conntrack?
> 
> I don't think there was any special setup involved, the config I started
> from was a generic distribution config and I didn't change any
> networking-specific options. In case that's helpful here's the .config I
> used.
> 
> https://pastebin.com/Bb2wttdx
> 
> > 
> > No crashes after more than one hour of stress-ng on
> > 1. 4 core amd64 Fedora 5.17 kernel
> > 2. 16 core amd64, linux stable 5.17.15
> > 3. 12 core intel, Fedora 5.18 kernel
> > 4. 3 core aarch64 vm, 5.18.7-200.fc36.aarch64
> > 
> 
> That would make sense, from further experiments I ran it somehow seems
> to be related to the number of workers being spawned by stress-ng along
> with the CPUs/cores involved.
>
> For instance, running the test with <=25 workers (--udp-flood 25 etc.)
> results in the test running fine for at least 15 minutes.

Ok.  I will let it run for longer on the machines I have access to.

In mean time, you could test attached patch, its simple s/refcount_/atomic_/
in nf_conntrack.

If mainline (patch vs. HEAD 69cb6c6556ad89620547318439) crashes for you
but works with attached patch someone who understands aarch64 memory ordering
would have to look more closely at refcount_XXX functions to see where they
might differ from atomic_ ones.

If it still crashes, please try below hunk in addition, although I don't see
how it would make a difference.

This is the one spot where the original conversion replaced atomic_inc()
with refcount_set(), this is on allocation, refcount is expected to be 0 so
refcount_inc() triggers a warning hinting at a use-after free.

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1776,7 +1776,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
                __nf_ct_try_assign_helper(ct, tmpl, GFP_ATOMIC);
 
        /* Now it is going to be associated with an sk_buff, set refcount to 1. */
-       atomic_set(&ct->ct_general.use, 1);
+       atomic_inc(&ct->ct_general.use);
 
        if (exp) {
                if (exp->expectfn)

View attachment "0001-netfilter-conntrack-revert-to-atomic_t-api.patch" of type "text/x-diff" (10434 bytes)