netdev - [bisected] Stack overflow after fs: "switch the IO-triggering parts of umount to fs_pin" (was net namespaces kernel stack overflow)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6e6f694-1bd9-7f3e-eaa8-1947c47f523f@virtuozzo.com>
Date:   Thu, 19 Apr 2018 15:50:25 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Alexander Aring <aring@...atatu.com>,
        Al Viro <viro@...iv.linux.org.uk>, linux-kernel@...r.kernel.org
Cc:     netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>
Subject: [bisected] Stack overflow after fs: "switch the IO-triggering parts
 of umount to fs_pin" (was net namespaces kernel stack overflow)

Hi, Al,

commit 87b95ce0964c016ede92763be9c164e49f1019e9 is the first after which the below test crashes the kernel:

    Author: Al Viro <viro@...iv.linux.org.uk>
    Date:   Sat Jan 10 19:01:08 2015 -0500

    switch the IO-triggering parts of umount to fs_pin
    
    Signed-off-by: Al Viro <viro@...iv.linux.org.uk>

$modprobe dummy

$while true
 do
     mkdir /var/run/netns
     touch /var/run/netns/init_net
     mount --bind /proc/1/ns/net /var/run/netns/init_net

     ip netns add foo
     ip netns exec foo ip link add dummy0 type dummy
     ip netns delete foo
done

[   22.058349] ip (3249) used greatest stack depth: 8 bytes left
[   22.182195] BUG: unable to handle kernel paging request at 000000035bb1f080
[   22.183065] IP: [<ffffffff810718e4>] kick_process+0x34/0x80
[   22.183065] PGD 0 
[   22.183065] Thread overran stack, or stack corrupted
[   22.183065] Oops: 0000 [#1] PREEMPT SMP 
[   22.183065] CPU: 1 PID: 3255 Comm: ip Not tainted 3.19.0-rc5+ #111
[   22.183065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[   22.183065] task: ffff88007c475100 ti: ffff88007b3cc000 task.ti: ffff88007b3cc000
[   22.183065] RIP: 0010:[<ffffffff810718e4>]  [<ffffffff810718e4>] kick_process+0x34/0x80
[   22.183065] RSP: 0018:ffff88007b3cfcf8  EFLAGS: 00010293
[   22.183065] RAX: 0000000000012900 RBX: ffff88007c475100 RCX: ffff88007b20e7b8
[   22.183065] RDX: 000000007b3cc028 RSI: ffffffff819b05f8 RDI: ffffffff819cb999
[   22.183065] RBP: ffff88007b3cfd08 R08: ffffffff81cbf688 R09: ffff88007d3d0810
[   22.183065] R10: ffff88007fc933c8 R11: 0000000000000000 R12: 000000007b3cc028
[   22.183065] R13: ffff88007c475100 R14: 0000000000000000 R15: 00007fff7793a448
[   22.183065] FS:  00007fc987546700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
[   22.183065] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   22.183065] CR2: 000000035bb1f080 CR3: 0000000001c11000 CR4: 00000000000006e0
[   22.183065] Stack:
[   22.183065]  ffff88007c3b67b8 ffff88007b3cfd98 ffff88007b3cfd18 ffffffff81066b05
[   22.183065]  ffff88007b3cfd38 ffffffff81176f4c ffff88007b3cfd48 ffff88007c3b68a0
[   22.183065]  ffff88007b3cfd48 ffffffff8117777f ffff88007b3cfd68 ffffffff81177a49
[   22.183065] Call Trace:
[   22.183065]  [<ffffffff81066b05>] task_work_add+0x45/0x60
[   22.183065]  [<ffffffff81176f4c>] mntput_no_expire+0xdc/0x150
[   22.183065]  [<ffffffff8117777f>] mntput+0x1f/0x30
[   22.183065]  [<ffffffff81177a49>] drop_mountpoint+0x29/0x30
[   22.183065]  [<ffffffff81188df6>] pin_kill+0x66/0xf0
[   22.183065]  [<ffffffff81082c60>] ? __wake_up_common+0x90/0x90
[   22.183065]  [<ffffffff81188ed9>] group_pin_kill+0x19/0x40
[   22.183065]  [<ffffffff811761d8>] namespace_unlock+0x58/0x60
[   22.183065]  [<ffffffff81178cae>] drop_collected_mounts+0x4e/0x60
[   22.183065]  [<ffffffff8117a3ed>] put_mnt_ns+0x2d/0x50
[   22.183065]  [<ffffffff81068b0a>] free_nsproxy+0x1a/0x80
[   22.183065]  [<ffffffff81068c68>] switch_task_namespaces+0x58/0x70
[   22.183065]  [<ffffffff81068c8b>] exit_task_namespaces+0xb/0x10
[   22.183065]  [<ffffffff8104eb57>] do_exit+0x2c7/0xc00
[   22.183065]  [<ffffffff8104f50a>] do_group_exit+0x3a/0xa0
[   22.183065]  [<ffffffff8104f57f>] SyS_exit_group+0xf/0x10
[   22.183065]  [<ffffffff817ad092>] system_call_fastpath+0x12/0x17

Kirill

On 19.04.2018 01:08, Kirill Tkhai wrote:
> Hi, Alexander!
> 
> On 18.04.2018 22:45, Alexander Aring wrote:
>> I currently can crash my net/master kernel by execute the following script:
>>
>> --- snip
>>
>> modprobe dummy
>>
>> #mkdir /var/run/netns
>> #touch /var/run/netns/init_net
>> #mount --bind /proc/1/ns/net /var/run/netns/init_net
>>
>> while true
>> do
>>     mkdir /var/run/netns
>>     touch /var/run/netns/init_net
>>     mount --bind /proc/1/ns/net /var/run/netns/init_net
>>
>>     ip netns add foo
>>     ip netns exec foo ip link add dummy0 type dummy
>>     ip netns delete foo
>> done
> 
> Fast answer is the best, so I tried your test on my not-for-work computer.
> There is old kernel without asynchronous pernet operations:
> 
> $uname -a
> Linux localhost.localdomain 4.15.0-2-amd64 #1 SMP Debian 4.15.11-1 (2018-03-20) x86_64 GNU/Linux
> 
> After approximately 15 seconds of your test execution it died :(
> (Hopefully, I executed it in "init 1" with all partitions RO as usual).
> 
> There is no serial console, so I can't say that the first stack is exactly
> the same as you see. But it crashed. So, it seems, the problem have been
> existing long ago.
> 
> Have you tried to reproduce it in older kernels or to bisect the problem commit?
> Or maybe it doesn't reproduce on old kernels in your environment?
> 
>> --- snap
>>
>> After max ~1 minute the kernel will crash.
>> Doing my hack of saving init_net outside the loop it will run fine...
>> So the mount bind is necessary.
>>
>> The last message which I see is:
>>
>> BUG: stack guard page was hit at 00000000f0751759 (stack is
>> 0000000069363195..0000000073ddc474)
>> kernel stack overflow (double-fault): 0000 [#1] SMP PTI
>> Modules linked in:
>> CPU: 0 PID: 13917 Comm: ip Not tainted 4.16.0-11878-gef9d066f6808 #32
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> RIP: 0010:validate_chain.isra.23+0x44/0xc40
>> RSP: 0018:ffffc900002cbff8 EFLAGS: 00010002
>> RAX: 0000000000040000 RBX: 0e58b88e1d4d15da RCX: 0e58b88e1d4d15da
>> RDX: 0000000000000000 RSI: ffff8802b25ee2a0 RDI: ffff8802b25edb00
>> RBP: 0e58b88e1d4d15da R08: 0000000000000000 R09: 0000000000000004
>> R10: ffffc900002cc050 R11: ffff8802b1054be8 R12: 0000000000000001
>> R13: ffff8802b25ee268 R14: ffff8802b25edb00 R15: 0000000000000000
>> FS:  0000000000000000(0000) GS:ffff8802bfc00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffc900002cbfe8 CR3: 0000000002024000 CR4: 00000000000006f0
>> Call Trace:
>>  ? get_max_files+0x10/0x10
>>  __lock_acquire+0x332/0x710
>>  lock_acquire+0x67/0xb0
>>  ? lockref_put_or_lock+0x9/0x30
>>  ? dput.part.7+0x17/0x2d0
>>  _raw_spin_lock+0x2b/0x60
>>  ? lockref_put_or_lock+0x9/0x30
>>  lockref_put_or_lock+0x9/0x30
>>  dput.part.7+0x1ec/0x2d0
>>  drop_mountpoint+0x10/0x40
>>  pin_kill+0x9b/0x3a0
>>  ? wait_woken+0x90/0x90
>>  ? mnt_pin_kill+0x2d/0x100
>>  mnt_pin_kill+0x2d/0x100
>>  cleanup_mnt+0x66/0x70
>>  pin_kill+0x9b/0x3a0
>>  ? wait_woken+0x90/0x90
>>  ? mnt_pin_kill+0x2d/0x100
>>  mnt_pin_kill+0x2d/0x100
>>  cleanup_mnt+0x66/0x70
>> ...
>>
>> I guess maybe it has something to do with recently switching to
>> migrate per-net ops to async.
>>
>> - Alex
> 
> Kirill
>