lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+b5yakqmKvuojJP04HT+6LvZ4k=VxHF9kFkbHaEA3D4nA@mail.gmail.com>
Date:   Thu, 27 Sep 2018 12:24:03 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        John Johansen <john.johansen@...onical.com>
Subject: Re: bpf: Massive skbuff_head_cache memory leak?

On Wed, Sep 26, 2018 at 11:09 PM, Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
> Hello, Alexei and Daniel.
>
> Can you show us how to run testcases you are testing?
>
> On 2018/09/22 22:25, Tetsuo Handa wrote:
>> Hello.
>>
>> syzbot is reporting many lockup problems on bpf.git / bpf-next.git / net.git / net-next.git trees.
>>
>>   INFO: rcu detected stall in br_multicast_port_group_expired (2)
>>   https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94
>>
>>   INFO: rcu detected stall in tun_chr_close
>>   https://syzkaller.appspot.com/bug?id=6c50618bde03e5a2eefdd0269cf9739c5ebb8270
>>
>>   INFO: rcu detected stall in discover_timer
>>   https://syzkaller.appspot.com/bug?id=55da031ddb910e58ab9c6853a5784efd94f03b54
>>
>>   INFO: rcu detected stall in ret_from_fork (2)
>>   https://syzkaller.appspot.com/bug?id=c83129a6683b44b39f5b8864a1325893c9218363
>>
>>   INFO: rcu detected stall in addrconf_rs_timer
>>   https://syzkaller.appspot.com/bug?id=21c029af65f81488edbc07a10ed20792444711b6
>>
>>   INFO: rcu detected stall in kthread (2)
>>   https://syzkaller.appspot.com/bug?id=6accd1ed11c31110fed1982f6ad38cc9676477d2
>>
>>   INFO: rcu detected stall in ext4_filemap_fault
>>   https://syzkaller.appspot.com/bug?id=817e38d20e9ee53390ac361bf0fd2007eaf188af
>>
>>   INFO: rcu detected stall in run_timer_softirq (2)
>>   https://syzkaller.appspot.com/bug?id=f5a230a3ff7822f8d39fddf8485931bd06ae47fe
>>
>>   INFO: rcu detected stall in bpf_prog_ADDR
>>   https://syzkaller.appspot.com/bug?id=fb4911fd0e861171cc55124e209f810a0dd68744
>>
>>   INFO: rcu detected stall in __run_timers (2)
>>   https://syzkaller.appspot.com/bug?id=65416569ddc8d2feb8f19066aa761f5a47f7451a
>>
>> The cause of lockup seems to be flood of printk() messages from memory allocation
>> failures, and one of out_of_memory() messages indicates that skbuff_head_cache
>> usage is huge enough to suspect in-kernel memory leaks.
>>
>>   [ 1554.547011] skbuff_head_cache    1847887KB    1847887KB
>>
>> Unfortunately, we cannot find from logs what syzbot is trying to do
>> because constant printk() messages is flooding away syzkaller messages.
>> Can you try running your testcases with kmemleak enabled?
>>
>
> On 2018/09/27 2:35, Dmitry Vyukov wrote:
>> I also started suspecting Apparmor. We switched to Apparmor on Aug 30:
>> https://groups.google.com/d/msg/syzkaller-bugs/o73lO4KGh0w/j9pcH2tSBAAJ
>> Now the instances that use SELinux and Smack explicitly contain that
>> in the name, but the rest are Apparmor.
>> Aug 30 roughly matches these assorted "task hung" reports. Perhaps
>> some Apparmor hook leaks a reference to skbs?
>
> Maybe. They have CONFIG_DEFAULT_SECURITY="apparmor". But I'm wondering why
> this problem is not occurring on linux-next.git when this problem is occurring
> on bpf.git / bpf-next.git / net.git / net-next.git trees. Is syzbot running
> different testcases depending on which git tree is targeted?


Yes, this is strange. Net/bpf instances run _subset_ of tests. That
is, they are more concentrated on the corresponding subsystems, but
other instances can run all these tests too, just with lower
probability.

Bpf instances are restricted to this set of syscalls:

"enable_syscalls": [
    "bpf", "mkdir", "mount$bpf", "unlink", "close",
    "perf_event_open", "ioctl$PERF*", "getpid", "gettid",
    "socketpair", "sendmsg", "recvmsg", "setsockopt$sock_attach_bpf",
    "socket$kcm", "ioctl$sock_kcm*",
    "mkdirat$cgroup*", "openat$cgroup*", "write$cgroup*",
    "openat$tun", "write$tun", "ioctl$TUN*", "ioctl$SIOCSIFHWADDR"
]

Net instances to this:

"enable_syscalls": [
    "accept", "accept4", "bind", "close", "connect", "epoll_create",
    "epoll_create1", "epoll_ctl", "epoll_pwait", "epoll_wait",
    "getpeername", "getsockname", "getsockopt", "ioctl", "listen",
    "mmap", "poll", "ppoll", "pread64", "preadv", "pselect6",
    "pwrite64", "pwritev", "read", "readv", "recvfrom", "recvmmsg",
    "recvmsg", "select", "sendfile", "sendmmsg", "sendmsg", "sendto",
    "setsockopt", "shutdown", "socket", "socketpair", "splice",
    "vmsplice", "write", "writev", "tee", "bpf", "getpid",
    "getgid", "getuid", "gettid", "unshare", "pipe",
    "syz_emit_ethernet", "syz_extract_tcp_res",
    "syz_genetlink_get_family_id", "syz_init_net_socket",
    "mkdirat$cgroup*", "openat$cgroup*", "write$cgroup*",
    "clock_gettime", "bpf"
]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ