linux-kernel - Re: general protection fault in put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z1FiaYyL1qRynmV96Kcv4QrZkP8HPEZXzsUT2f1QDfuQ@mail.gmail.com>
Date:   Tue, 25 Dec 2018 10:41:53 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Manfred Spraul <manfred@...orfullife.com>
Cc:     syzbot+1145ec2e23165570c3ac@...kaller.appspotmail.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Howells <dhowells@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        ktsanaktsidis@...desk.com, LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Matthew Wilcox <willy@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>
Subject: Re: general protection fault in put_pid

On Sun, Dec 23, 2018 at 1:32 PM Manfred Spraul <manfred@...orfullife.com> wrote:
>
> Hi Dmitry,
>
> let's simplify the mail, otherwise noone can follow:
>
> On 12/23/18 11:42 AM, Dmitry Vyukov wrote:
> >
> >> My naive attempts to re-reproduce this failed so far.
> >> But I noticed that _all_ logs for these 3 crashes:
> >> https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909
> >> https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac
> >> https://syzkaller.appspot.com/bug?extid=9d8b6fa6ee7636f350c1
> >> involve low memory conditions. My gut feeling says this is not a
> >> coincidence. This is also probably the reason why all reproducers
> >> create large sem sets. There must be some bad interaction between low
> >> memory condition and semaphores/ipc namespaces.
> >
> > Actually was able to reproduce this with a syzkaller program:
> >
> > ./syz-execprog -repeat=0 -procs=10 prog
> > ...
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
> > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48
> > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00
> > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02
> > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f
> > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728
> > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401
> > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98
> > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000
> > FS:  0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >   __list_del_entry include/linux/list.h:117 [inline]
> >   list_del include/linux/list.h:125 [inline]
> >   unlink_queue ipc/sem.c:786 [inline]
> >   freeary+0xddb/0x1c90 ipc/sem.c:1164
> >   free_ipcs+0xf0/0x160 ipc/namespace.c:112
> >   sem_exit_ns+0x20/0x40 ipc/sem.c:237
> >   free_ipc_ns ipc/namespace.c:120 [inline]
> >   put_ipc_ns+0x55/0x160 ipc/namespace.c:152
> >   free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180
> >   switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229
> >   exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234
> >   do_exit+0x19e5/0x27d0 kernel/exit.c:866
> >   do_group_exit+0x151/0x410 kernel/exit.c:970
> >   __do_sys_exit_group kernel/exit.c:981 [inline]
> >   __se_sys_exit_group kernel/exit.c:979 [inline]
> >   __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979
> >   do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x4570e9
> > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
> > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9
> > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045
> > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000
> > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008
> > Modules linked in:
> > Dumping ftrace buffer:
> >     (ftrace buffer empty)
> > ---[ end trace 17829b0f00569a59 ]---
> > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
> > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48
> > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00
> > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02
> > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f
> > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728
> > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401
> > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98
> > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000
> > FS:  0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >
> >
> > The prog is:
> > unshare(0x8020000)
> > semget$private(0x0, 0x4007, 0x0)
> >
> > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3
> >
> > and again it involved lots of oom kills, the repro eats all memory, a
> > process getting killed, frees some memory and the process repeats.
>
> Ok, thus the above program triggers two bugs:
>
> - a huge memory leak with semaphore arrays
>
> - under OOM pressure, an oops.
>
>
> 1) I can reproduce the memory leak, it happens all the time :-(
>
> I must look what is wrong.
>
> 2) regarding the crash:
>
> What differs under oom pressure?
>
> - kvmalloc can fall back to vmalloc()
>
> - the 2nd or 3rd of multiple allocations can fail, and that triggers a
> rare codepath/race condition.
>
> - rcu callback can happen earlier that expected
>
> So far, I didn't notice anything unexpected :-(

I started suspecting a stack overflow. But I was afraid if may be a
KASAN artifact, as it both increases stack usage and disables vmap
stacks.
But I was able to reproduce this without KASAN and root cause at the same time.

I am on v4.20, config is (basically just defconfig+kvmconfig):
https://gist.githubusercontent.com/dvyukov/f8401c8da367088c789bfb953d42d3b3/raw/eac0e85d3db577ba68ec59acf916899b61741ee1/gistfile1.txt

Running the syzkaller program gave me:

Out of memory: Kill process 13971 (syz-executor) score 998 or sacrifice child
Killed process 13971 (syz-executor) total-vm:37512kB, anon-rss:92kB,
file-rss:0kB, shmem-rss:0kB
oom_reaper: reaped process 13971 (syz-executor), now anon-rss:0kB,
file-rss:0kB, shmem-rss:0kB
Kernel panic - not syncing: corrupted stack end detected inside scheduler
CPU: 3 PID: 2555 Comm: kworker/u12:3 Not tainted 4.20.0-rc7+ #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
 dump_stack+0x1d4/0x2b5 lib/earlycpio.c:120
 panic+0x25e/0x49c kernel/cpu.c:617
 __schedule+0x1be8/0x21d0
 preempt_schedule_common+0x35/0xe0
 preempt_schedule+0x23/0x30
 ___preempt_schedule+0x16/0x18
 _raw_spin_unlock_irq+0x75/0x80
 mark_work_canceling kernel/workqueue.c:747 [inline]
 __flush_work+0x4f5/0x970 kernel/workqueue.c:2996
 flush_work+0x17/0x20 kernel/workqueue.c:3059
 drain_all_pages+0x418/0x680 mm/page_alloc.c:4570
 __alloc_pages_slowpath+0xb76/0x2c10 mm/page_alloc.c:4072
 __alloc_pages_nodemask+0xa6c/0xe10 mm/page_alloc.c:5029
 cache_grow_begin+0x9d/0x8a0
 fallback_alloc+0x204/0x2e0
 ____cache_alloc_node+0x1cc/0x1f0
 slab_alloc_node mm/slub.c:2710 [inline]
 slab_alloc mm/slub.c:2752 [inline]
 kmem_cache_alloc+0x296/0x720 mm/slub.c:2769
 mempool_alloc_slab+0x44/0x60 mm/mempool.c:130
 mempool_alloc+0x174/0x4e0 mm/mempool.c:433
 bvec_alloc+0x150/0x2d0 block/bio.c:485
 bio_alloc_bioset+0x44e/0x650 block/bio.c:1455
 ext4_bio_write_page+0xc11/0x1780 fs/ext4/resize.c:76
 mpage_add_bh_to_extent fs/ext4/inode.c:2300 [inline]
 mpage_submit_page+0x138/0x230 fs/ext4/inode.c:2335
 ext4_da_page_release_reservation fs/ext4/inode.c:1651 [inline]
 mpage_process_page_bufs+0x429/0x500 fs/ext4/inode.c:3226
 mpage_prepare_extent_to_map+0xb2a/0x1640 fs/ext4/inode.c:154
 ext4_inode_journal_mode fs/ext4/ext4_jbd2.h:411 [inline]
 ext4_should_journal_data fs/ext4/ext4_jbd2.h:427 [inline]
 ext4_writepages+0x112c/0x3a20 fs/ext4/inode.c:2190
 test_and_set_bit arch/x86/include/asm/bitops.h:220 [inline]
 TestSetPageDirty include/linux/page-flags.h:287 [inline]
 do_writepages+0xfc/0x170 mm/page-writeback.c:2383
 mark_inode_dirty_sync include/linux/fs.h:2124 [inline]
 __writeback_single_inode+0x1cd/0x12e0 fs/fs-writeback.c:1372
 writeback_sb_inodes+0x6c7/0x1040 fs/fs-writeback.c:1795
 __writeback_inodes_wb+0x1a3/0x310 fs/fs-writeback.c:1704
 wb_writeback+0x92c/0xe10 include/trace/events/writeback.h:572
syz-executor invoked oom-killer:
gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null),
order=3, oom_score_adj=0
syz-executor cpuset=/ mems_allowed=0-1
 wb_workfn+0xdf3/0x1600 fs/pnode.c:430
 get_unbound_pool kernel/workqueue.c:3437 [inline]
 process_one_work+0xcf3/0x1be0 kernel/workqueue.c:3612
 worker_thread+0x17d/0x12f0 kernel/workqueue.c:2289
 __write_once_size include/linux/compiler.h:218 [inline]
 __list_del include/linux/list.h:106 [inline]
 __list_del_entry include/linux/list.h:120 [inline]
 list_del_init include/linux/list.h:159 [inline]
 kthread+0x354/0x430 kernel/kthread.c:1010
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:358
CPU: 0 PID: 6768 Comm: syz-executor Not tainted 4.20.0-rc7+ #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x1d4/0x2b5 lib/earlycpio.c:120
 dump_header+0x294/0xfaf
 oom_killer_enable mm/oom_kill.c:715 [inline]
 oom_kill_process+0xa3f/0xd20 mm/oom_kill.c:750
 out_of_memory+0x88c/0x12a0 mm/fadvise.c:184
 compound_order include/linux/mm.h:707 [inline]
 page_hstate include/linux/hugetlb.h:469 [inline]
 __alloc_pages_slowpath+0x1cfa/0x2c10 mm/page_alloc.c:7820
 __alloc_pages_nodemask+0xa6c/0xe10 mm/page_alloc.c:5029
 copy_process+0x94c/0x7b00
 variable_test_bit arch/x86/include/asm/bitops.h:332 [inline]
 cpumask_test_cpu include/linux/cpumask.h:344 [inline]
 trace_sched_process_fork include/trace/events/sched.h:288 [inline]
 _do_fork+0x191/0xf20 kernel/fork.c:2232
 __x64_sys_clone+0xbf/0x150 kernel/fork.c:2340
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_32_irqs_on arch/x86/entry/common.c:341 [inline]
 do_syscall_64+0x192/0x770 arch/x86/entry/common.c:349
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45578b
Code: db 45 85 f6 0f 85 95 01 00 00 64 4c 8b 04 25 10 00 00 00 31 d2
4d 8d 90 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d
00 f0 ff ff 0f 87 d6 00 00 00 85 c0 41 89 c5 0f 85 dd 00 00
RSP: 002b:00007fff9dc6ca20 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007fff9dc6ca20 RCX: 000000000045578b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
RBP: 00007fff9dc6ca70 R08: 0000000001d0d940 R09: 0000000000000000
R10: 0000000001d0dc10 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000

and second time:

[  281.244340] Kernel panic - not syncing: corrupted stack end
detected inside scheduler
[  281.245754] CPU: 2 PID: 6265 Comm: kworker/u12:4 Not tainted 4.20.0-rc7+ #6
[  281.246887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[  281.248240] Workqueue: writeback wb_workfn (flush-8:0)
[  281.248992] Call Trace:
[  281.249364]  dump_stack+0x1d4/0x2b5
[  281.252261]  panic+0x25e/0x49c
[  281.255403]  __schedule+0x1be8/0x21d0
[  281.263754]  preempt_schedule_common+0x35/0xe0
[  281.264425]  preempt_schedule+0x23/0x30
[  281.265010]  ___preempt_schedule+0x16/0x18
[  281.265635]  _raw_spin_unlock_irqrestore+0xbf/0xe0
[  281.266357]  __remove_mapping+0x77b/0x17e0
[  281.291388]  shrink_page_list+0x5232/0xa6b0
[  281.414732]  shrink_inactive_list+0x997/0x1ab0
[  281.419009]  shrink_node_memcg+0x9de/0x16a0
[  281.424799]  shrink_node+0x3af/0x1530
[  281.433316]  do_try_to_free_pages+0x3bc/0x1170
[  281.435723]  try_to_free_pages+0x43c/0x9e0
[  281.442644]  __alloc_pages_slowpath+0xa4c/0x2c10
[  281.459197]  __alloc_pages_nodemask+0xa6c/0xe10
[  281.466504]  alloc_pages_current+0xb6/0x1e0
[  281.467326]  __page_cache_alloc+0x332/0x560
[  281.471049]  pagecache_get_page+0x2af/0xdd0
[  281.487360]  __getblk_gfp+0x36e/0xd50
[  281.497989]  ext4_read_block_bitmap_nowait+0x2ed/0x1e10
[  281.509111]  ext4_read_block_bitmap+0x23/0x80
[  281.509934]  ext4_mb_mark_diskspace_used+0x180/0x10a0
[  281.512755]  ext4_mb_new_blocks+0xeb7/0x4260
[  281.540189]  ext4_ext_map_blocks+0x2776/0x5b00
[  281.556040]  ext4_map_blocks+0xcaa/0x1860
[  281.559967]  ext4_writepages+0x1e4c/0x3a20
[  281.575738]  do_writepages+0xfc/0x170
[  281.578546]  __writeback_single_inode+0x1cd/0x12e0
[  281.592498]  writeback_sb_inodes+0x6c7/0x1040
[  281.598601]  __writeback_inodes_wb+0x1a3/0x310
[  281.600816]  wb_writeback+0x92c/0xe10
[  281.618064]  wb_workfn+0xdf3/0x1600
[  281.635970]  process_one_work+0xcf3/0x1be0
[  281.662614]  worker_thread+0x17d/0x12f0
[  281.680989]  kthread+0x354/0x430
[  281.682529]  ret_from_fork+0x3a/0x50

One time it took about 10 seconds and another time it took 5 minutes.

Whom should we route this to? It looks both mm and ext4 related.