netdev - Re: INFO: rcu detected stall in ndisc_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+Yy-bF07F7F8DoFY8=4LtLURRn1WsZzNZ9LN+N=vn7Tpw@mail.gmail.com>
Date:   Mon, 7 Jan 2019 12:12:53 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     syzbot <syzbot+ea7d9cb314b4ab49a18a@...kaller.appspotmail.com>,
        David Miller <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        LKML <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Linux-MM <linux-mm@...ck.org>, Shakeel Butt <shakeelb@...gle.com>
Subject: Re: INFO: rcu detected stall in ndisc_alloc_skb

On Sun, Jan 6, 2019 at 2:47 PM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> On 2019/01/06 22:24, Dmitry Vyukov wrote:
> >> A report at 2019/01/05 10:08 from "no output from test machine (2)"
> >> ( https://syzkaller.appspot.com/text?tag=CrashLog&x=1700726f400000 )
> >> says that there are flood of memory allocation failure messages.
> >> Since continuous memory allocation failure messages itself is not
> >> recognized as a crash, we might be misunderstanding that this problem
> >> is not occurring recently. It will be nice if we can run testcases
> >> which are executed on bpf-next tree.
> >
> > What exactly do you mean by running test cases on bpf-next tree?
> > syzbot tests bpf-next, so it executes lots of test cases on that tree.
> > One can also ask for patch testing on bpf-next tree to test a specific
> > test case.
>
> syzbot ran "some tests" before getting this report, but we can't find from
> this report what the "some tests" are. If we could record all tests executed
> in syzbot environments before getting this report, we could rerun the tests
> (with manually examining where the source of memory consumption is) in local
> environments.

Filed https://github.com/google/syzkaller/issues/917 for this.

> Since syzbot is now using memcg, maybe we can test with sysctl_panic_on_oom == 1.
> Any memory consumption that triggers global OOM killer could be considered as
> a problem (e.g. memory leak or uncontrolled memory allocation).

Interesting idea. This will also alleviate the previous problem as I
think only a stream of OOMs currently produces 1+MB of output.

+Shakeel who was interested in catching more memcg-escaping allocations.

To do this we need a buy-in from kernel community to consider this as
a bug/something to fix in kernel. Systematic testing can't work gray
checks requiring humans to look at each case and some cases left as
being working-as-intended.

There are also 2 interesting points:
 - testing of kernel without memcg-enabled (some kernel users
obviously do this); it's doable, but currently syzkaller have no
precedents/infrastructure to consider some output patterns as bugs or
not depending on kernel features
 - false positives for minimized C reproducers that have memcg code
stripped off (people complain that reproducers are too large/complex
otherwise)