[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z3GYkvL28BmyGWJHZmK=NgrJcMfhqyXPPMME_Rk8X_qw@mail.gmail.com>
Date: Tue, 25 Dec 2018 10:35:35 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Manfred Spraul <manfred@...orfullife.com>
Cc: syzbot+1145ec2e23165570c3ac@...kaller.appspotmail.com,
Andrew Morton <akpm@...ux-foundation.org>,
David Howells <dhowells@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
ktsanaktsidis@...desk.com, LKML <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...e.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Matthew Wilcox <willy@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>
Subject: Re: general protection fault in put_pid
On Sun, Dec 23, 2018 at 7:38 PM Manfred Spraul <manfred@...orfullife.com> wrote:
>
> Hello Dmitry,
>
> On 12/23/18 11:42 AM, Dmitry Vyukov wrote:
> > Actually was able to reproduce this with a syzkaller program:
> > ./syz-execprog -repeat=0 -procs=10 prog
> > ...
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
> > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48
> > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00
> > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02
> > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f
> > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728
> > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401
> > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98
> > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000
> > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > __list_del_entry include/linux/list.h:117 [inline]
> > list_del include/linux/list.h:125 [inline]
> > unlink_queue ipc/sem.c:786 [inline]
> > freeary+0xddb/0x1c90 ipc/sem.c:1164
> > free_ipcs+0xf0/0x160 ipc/namespace.c:112
> > sem_exit_ns+0x20/0x40 ipc/sem.c:237
> > free_ipc_ns ipc/namespace.c:120 [inline]
> > put_ipc_ns+0x55/0x160 ipc/namespace.c:152
> > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180
> > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229
> > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234
> > do_exit+0x19e5/0x27d0 kernel/exit.c:866
> > do_group_exit+0x151/0x410 kernel/exit.c:970
> > __do_sys_exit_group kernel/exit.c:981 [inline]
> > __se_sys_exit_group kernel/exit.c:979 [inline]
> > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979
> > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x4570e9
> > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
> > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9
> > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045
> > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000
> > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008
> > Modules linked in:
> > Dumping ftrace buffer:
> > (ftrace buffer empty)
> > ---[ end trace 17829b0f00569a59 ]---
> > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
> > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48
> > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00
> > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02
> > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f
> > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728
> > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401
> > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98
> > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000
> > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >
> >
> > The prog is:
> > unshare(0x8020000)
> > semget$private(0x0, 0x4007, 0x0)
> >
> > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3
> >
> > and again it involved lots of oom kills, the repro eats all memory, a
> > process getting killed, frees some memory and the process repeats.
>
> I was too fast: I can't reproduce the memory leak.
>
> Can you send me the source for prog?
Here is the program:
https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt
But we concluded this is not a leak, right?
It just creates large semaphores tied to a persistent ipcns. Once the
process is killed, all memory is released. When this program runs, it
eats all memory, then one of the subprocesses is oom-killed, part of
memory is released, then all memory is consumed again by a new
subprocess and this repeats. If all processes are killed, all memory
is released back. It seems to be working as intended.
However, what you said about kernel.sem sysctl is useful and I think
we need to use it for additional sandboxing of syzkaller test
processes. I am thinking of applying:
kernel.shmmax = 16777216
kernel.shmall = 536870912
kernel.shmmni = 1024
kernel.msgmax = 8192
kernel.msgmni = 1024
kernel.msgmnb = 1024
kernel.sem = 1024 1048576 500 1024
It should be enough to trigger bugs of any complexity (oom's aside),
but should prevent uncontrolled memory consumption.
Looking at the code I figured that these sysctls are
per-ipc-namespace, right? I.e. if I do sysctl from an ipcns, the
limits will be set only only for that ns. I won't use this initially,
but something to keep in mind if the global limits will fail in some
way.
Powered by blists - more mailing lists