[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOHTApjPwWDxHmGrfMwFvszWubfjJ7YfBgZL-DQ0mdv00UtQEQ@mail.gmail.com>
Date: Wed, 18 Apr 2018 15:45:59 -0400
From: Alexander Aring <aring@...atatu.com>
To: ktkhai@...tuozzo.com
Cc: netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>
Subject: net namespaces kernel stack overflow
Hi,
I currently can crash my net/master kernel by execute the following script:
--- snip
modprobe dummy
#mkdir /var/run/netns
#touch /var/run/netns/init_net
#mount --bind /proc/1/ns/net /var/run/netns/init_net
while true
do
mkdir /var/run/netns
touch /var/run/netns/init_net
mount --bind /proc/1/ns/net /var/run/netns/init_net
ip netns add foo
ip netns exec foo ip link add dummy0 type dummy
ip netns delete foo
done
--- snap
After max ~1 minute the kernel will crash.
Doing my hack of saving init_net outside the loop it will run fine...
So the mount bind is necessary.
The last message which I see is:
BUG: stack guard page was hit at 00000000f0751759 (stack is
0000000069363195..0000000073ddc474)
kernel stack overflow (double-fault): 0000 [#1] SMP PTI
Modules linked in:
CPU: 0 PID: 13917 Comm: ip Not tainted 4.16.0-11878-gef9d066f6808 #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
RIP: 0010:validate_chain.isra.23+0x44/0xc40
RSP: 0018:ffffc900002cbff8 EFLAGS: 00010002
RAX: 0000000000040000 RBX: 0e58b88e1d4d15da RCX: 0e58b88e1d4d15da
RDX: 0000000000000000 RSI: ffff8802b25ee2a0 RDI: ffff8802b25edb00
RBP: 0e58b88e1d4d15da R08: 0000000000000000 R09: 0000000000000004
R10: ffffc900002cc050 R11: ffff8802b1054be8 R12: 0000000000000001
R13: ffff8802b25ee268 R14: ffff8802b25edb00 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8802bfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900002cbfe8 CR3: 0000000002024000 CR4: 00000000000006f0
Call Trace:
? get_max_files+0x10/0x10
__lock_acquire+0x332/0x710
lock_acquire+0x67/0xb0
? lockref_put_or_lock+0x9/0x30
? dput.part.7+0x17/0x2d0
_raw_spin_lock+0x2b/0x60
? lockref_put_or_lock+0x9/0x30
lockref_put_or_lock+0x9/0x30
dput.part.7+0x1ec/0x2d0
drop_mountpoint+0x10/0x40
pin_kill+0x9b/0x3a0
? wait_woken+0x90/0x90
? mnt_pin_kill+0x2d/0x100
mnt_pin_kill+0x2d/0x100
cleanup_mnt+0x66/0x70
pin_kill+0x9b/0x3a0
? wait_woken+0x90/0x90
? mnt_pin_kill+0x2d/0x100
mnt_pin_kill+0x2d/0x100
cleanup_mnt+0x66/0x70
...
I guess maybe it has something to do with recently switching to
migrate per-net ops to async.
- Alex
Powered by blists - more mailing lists