netdev - Re: BUG: kernel NULL pointer dereference in __cgroup_bpf_run_filter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMtihapcPYn-tZyypwN8ZLMWGeqErC37gFtyLp9zv-mcmcn7eg@mail.gmail.com>
Date:   Mon, 15 Jun 2020 15:05:54 +0200
From:   Daniël Sonck <dsonck92@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: BUG: kernel NULL pointer dereference in __cgroup_bpf_run_filter_skb

Op zo 14 jun. 2020 om 22:43 schreef Daniël Sonck <dsonck92@...il.com>:
>
> Hello,
>
> Op zo 14 jun. 2020 om 20:29 schreef Cong Wang <xiyou.wangcong@...il.com>:
> >
> > Hello,
> >
> > On Sun, Jun 14, 2020 at 5:39 AM Daniël Sonck <dsonck92@...il.com> wrote:
> > >
> > > Hello,
> > >
> > > I found on the archive that this bug I encountered also happened to
> > > others. I too have a very similar stacktrace. The issue I'm
> > > experiencing is:
> > >
> > > Whenever I fully boot my cluster, in some time, the host crashes with
> > > the __cgroup_bpf_run_filter_skb NULL pointer dereference. This has
> > > been sporadic enough before not to cause real issues. However, as of
> > > lately, the bug is triggered much more frequently. I've changed my
> > > server hardware so I could capture serial output in order to get the
> > > trace. This trace looked very similar as reported by Lu Fengqi. As it
> > > currently stands, I cannot run the cluster as it's almost instantly
> > > crashing the host.
> >
> > This has been reported for multiple times. Are you able to test the
> > attached patch? And let me know if everything goes fine with it.
>
> I will try out the patch. Since the host reliably crashed each time as
> I booted up
> the cluster VMs I will be able to tell whether it has any positive effect.
> >
> > I suspect we may still leak some cgroup refcnt even with the patch,
> > but it might be much harder to trigger with this patch applied.
>
> Currently applying the patch to the kernel and compiling so I should
> know in a few hours

The compilation with the patch has finished and I've since rebooted to the
new kernel about 12 hours ago, so far this bug did not trigger whereas without
the patch, by this time it would have triggered. Regardless, I will keep my
serial connection in case something pops up.
> >
> > Thanks.