[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af79013b-d496-f29f-5e57-da11658310aa@canonical.com>
Date: Wed, 26 Sep 2018 16:35:25 -0700
From: John Johansen <john.johansen@...onical.com>
To: Daniel Borkmann <daniel@...earbox.net>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Alexei Starovoitov <ast@...nel.org>
Cc: Network Development <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Dmitry Vyukov <dvyukov@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>
Subject: Re: bpf: Massive skbuff_head_cache memory leak?
On 09/26/2018 02:22 PM, Daniel Borkmann wrote:
> On 09/26/2018 11:09 PM, Tetsuo Handa wrote:
>> Hello, Alexei and Daniel.
>>
>> Can you show us how to run testcases you are testing?
>
> Sorry for the delay; currently quite backlogged but will definitely take a look
> at these reports. Regarding your question: majority of test cases are in the
> kernel tree under selftests, see tools/testing/selftests/bpf/ .
>
Its unlikely to be apparmor. I went through the reports and saw nothing that
would indicate apparmor involvement, but the primary reason is what is being tested
in upstream apparmor atm.
The current upstream code does nothing directly with skbuffs. Its
possible that the audit code paths (kernel audit does grab skbuffs)
could, but there are only a couple cases that would be triggered in
the current fuzzing so this seems to be an unlikely source for such a
large leak.
>> On 2018/09/22 22:25, Tetsuo Handa wrote:
>>> Hello.
>>>
>>> syzbot is reporting many lockup problems on bpf.git / bpf-next.git / net.git / net-next.git trees.
>>>
>>> INFO: rcu detected stall in br_multicast_port_group_expired (2)
>>> https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94
>>>
>>> INFO: rcu detected stall in tun_chr_close
>>> https://syzkaller.appspot.com/bug?id=6c50618bde03e5a2eefdd0269cf9739c5ebb8270
>>>
>>> INFO: rcu detected stall in discover_timer
>>> https://syzkaller.appspot.com/bug?id=55da031ddb910e58ab9c6853a5784efd94f03b54
>>>
>>> INFO: rcu detected stall in ret_from_fork (2)
>>> https://syzkaller.appspot.com/bug?id=c83129a6683b44b39f5b8864a1325893c9218363
>>>
>>> INFO: rcu detected stall in addrconf_rs_timer
>>> https://syzkaller.appspot.com/bug?id=21c029af65f81488edbc07a10ed20792444711b6
>>>
>>> INFO: rcu detected stall in kthread (2)
>>> https://syzkaller.appspot.com/bug?id=6accd1ed11c31110fed1982f6ad38cc9676477d2
>>>
>>> INFO: rcu detected stall in ext4_filemap_fault
>>> https://syzkaller.appspot.com/bug?id=817e38d20e9ee53390ac361bf0fd2007eaf188af
>>>
>>> INFO: rcu detected stall in run_timer_softirq (2)
>>> https://syzkaller.appspot.com/bug?id=f5a230a3ff7822f8d39fddf8485931bd06ae47fe
>>>
>>> INFO: rcu detected stall in bpf_prog_ADDR
>>> https://syzkaller.appspot.com/bug?id=fb4911fd0e861171cc55124e209f810a0dd68744
>>>
>>> INFO: rcu detected stall in __run_timers (2)
>>> https://syzkaller.appspot.com/bug?id=65416569ddc8d2feb8f19066aa761f5a47f7451a
>>>
>>> The cause of lockup seems to be flood of printk() messages from memory allocation
>>> failures, and one of out_of_memory() messages indicates that skbuff_head_cache
>>> usage is huge enough to suspect in-kernel memory leaks.
>>>
>>> [ 1554.547011] skbuff_head_cache 1847887KB 1847887KB
>>>
>>> Unfortunately, we cannot find from logs what syzbot is trying to do
>>> because constant printk() messages is flooding away syzkaller messages.
>>> Can you try running your testcases with kmemleak enabled?
>>>
>>
>> On 2018/09/27 2:35, Dmitry Vyukov wrote:
>>> I also started suspecting Apparmor. We switched to Apparmor on Aug 30:
>>> https://groups.google.com/d/msg/syzkaller-bugs/o73lO4KGh0w/j9pcH2tSBAAJ
>>> Now the instances that use SELinux and Smack explicitly contain that
>>> in the name, but the rest are Apparmor.
>>> Aug 30 roughly matches these assorted "task hung" reports. Perhaps
>>> some Apparmor hook leaks a reference to skbs?
>>
>> Maybe. They have CONFIG_DEFAULT_SECURITY="apparmor". But I'm wondering why
>> this problem is not occurring on linux-next.git when this problem is occurring
>> on bpf.git / bpf-next.git / net.git / net-next.git trees. Is syzbot running
>> different testcases depending on which git tree is targeted?
>>
>
this is another reason that it is doubtful that its apparmor.
Powered by blists - more mailing lists