[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240408161336.612064-1-mhal@rbox.co>
Date: Mon, 8 Apr 2024 17:58:44 +0200
From: Michal Luczaj <mhal@...x.co>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
kuniyu@...zon.com,
Michal Luczaj <mhal@...x.co>
Subject: [PATCH net 0/2] af_unix: Garbage collector vs connect() race condition
Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.
sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped
connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc()
---------------- ------------------------- -----------
NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
// V count=1 inflight=0
NS = unix_peer(S)
skb2 = sock_alloc()
skb_queue_tail(NS, skb2[V])
// V became in-flight
// V count=2 inflight=1
close(V)
// V count=1 inflight=1
// GC candidate condition met
for u in gc_inflight_list:
if (total_refs == inflight_refs)
add u to gc_candidates
// gc_candidates={L, V}
for u in gc_candidates:
scan_children(u, dec_inflight)
// embryo (skb1) was not
// reachable from L yet, so V's
// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
for u in gc_candidates:
if (u.inflight)
scan_children(u, inc_inflight_move_tail)
// V count=1 inflight=2 (!)
The idea behind the patch is to unix_state_lock()/unlock() L during the
gc_candidates selection. That would guarantee either connect() has
concluded, or - if we raced it - parallel sendmsg() with SCM_RIGHTS could
not happen as we are already holding the unix_gc_lock.
Running the reproducer with mdelay(1) stuffed in unix_stream_connect()
results in immediate splats:
$ ./tools/testing/selftests/net/af_unix/gc_vs_connect
running
[ 47.019387] WARNING: CPU: 3 PID: 12 at net/unix/garbage.c:284 __unix_gc+0x473/0x4a0
[ 47.019405] Modules linked in: 9p netfs kvm_intel kvm 9pnet_virtio 9pnet i2c_piix4 zram crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
[ 47.019419] CPU: 3 PID: 12 Comm: kworker/u32:1 Not tainted 6.9.0-rc2nokasan+ #1
[ 47.019421] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 47.019422] Workqueue: events_unbound __unix_gc
[ 47.019424] RIP: 0010:__unix_gc+0x473/0x4a0
[ 47.019425] Code: 8d bb d0 f9 ff ff 48 0f ba 73 58 01 0f b6 83 e2 f9 ff ff 31 d2 48 c7 c6 a0 c3 de 81 3c 0a 74 18 e8 e2 f8 ff ff e9 96 fd ff ff <0f> 0b e9 ef fb ff ff 0f 0b e9 39 fc ff ff e8 5a fa ff ff e9 7e fd
[ 47.019427] RSP: 0018:ffffc9000006bd90 EFLAGS: 00010297
[ 47.019429] RAX: 0000000000000002 RBX: ffff888109459680 RCX: 000000003342a6b0
[ 47.019430] RDX: 0000000000000001 RSI: ffffffff82634e8c RDI: ffffffff83161740
[ 47.019431] RBP: ffffc9000006be40 R08: 00000000000003d7 R09: 00000000000003d7
[ 47.019432] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff831610e0
[ 47.019433] R13: ffff888109459cb0 R14: ffffc9000006bd90 R15: ffffffff831616c0
[ 47.019434] FS: 0000000000000000(0000) GS:ffff88842fb80000(0000) knlGS:0000000000000000
[ 47.019435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.019436] CR2: 000055c16e3893b8 CR3: 0000000002e37000 CR4: 0000000000750ef0
[ 47.019438] PKRU: 55555554
[ 47.019439] Call Trace:
[ 47.019440] <TASK>
[ 47.019442] ? __warn+0x88/0x180
[ 47.019445] ? __unix_gc+0x473/0x4a0
[ 47.019447] ? report_bug+0x189/0x1c0
[ 47.019451] ? handle_bug+0x38/0x70
[ 47.019453] ? exc_invalid_op+0x13/0x60
[ 47.019454] ? asm_exc_invalid_op+0x16/0x20
[ 47.019460] ? __unix_gc+0x473/0x4a0
[ 47.019464] ? lock_acquire+0xd5/0x2c0
[ 47.019466] ? lock_release+0x133/0x290
[ 47.019471] process_one_work+0x215/0x700
[ 47.019473] ? move_linked_works+0x70/0xa0
[ 47.019477] worker_thread+0x1ca/0x3b0
[ 47.019479] ? rescuer_thread+0x340/0x340
[ 47.019480] kthread+0xdd/0x110
[ 47.019482] ? kthread_complete_and_exit+0x20/0x20
[ 47.019485] ret_from_fork+0x2d/0x50
[ 47.019487] ? kthread_complete_and_exit+0x20/0x20
[ 47.019489] ret_from_fork_asm+0x11/0x20
[ 47.019495] </TASK>
[ 47.019496] irq event stamp: 446769
[ 47.019497] hardirqs last enabled at (446775): [<ffffffff81199a69>] console_unlock+0xf9/0x120
[ 47.019499] hardirqs last disabled at (446780): [<ffffffff81199a4e>] console_unlock+0xde/0x120
[ 47.019501] softirqs last enabled at (444340): [<ffffffff810fcd73>] __irq_exit_rcu+0x93/0x100
[ 47.019504] softirqs last disabled at (444333): [<ffffffff810fcd73>] __irq_exit_rcu+0x93/0x100
[ 47.019505] ---[ end trace 0000000000000000 ]---
...
[ 47.019555] list_add corruption. prev->next should be next (ffffffff83161710), but was ffff888109459cb0. (prev=ffff888109459cb0).
[ 47.019572] kernel BUG at lib/list_debug.c:32!
[ 47.019654] invalid opcode: 0000 [#1] PREEMPT SMP
[ 47.019676] CPU: 0 PID: 1057 Comm: gc_vs_connect Tainted: G W 6.9.0-rc2nokasan+ #1
[ 47.019698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 47.019720] RIP: 0010:__list_add_valid_or_report+0x70/0x90
[ 47.019743] Code: 98 ff 0f 0b 48 89 c1 48 c7 c7 90 1a 72 82 e8 77 ed 98 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 e8 1a 72 82 e8 60 ed 98 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 40 1b 72 82 e8 49 ed 98
[ 47.019766] RSP: 0018:ffffc9000125fba8 EFLAGS: 00010286
[ 47.019790] RAX: 0000000000000075 RBX: ffff888109459680 RCX: 0000000000000000
[ 47.019814] RDX: 0000000000000002 RSI: ffffffff82667570 RDI: 00000000ffffffff
[ 47.019838] RBP: ffff8881067bc400 R08: 0000000000000000 R09: 0000000000000003
[ 47.019861] R10: ffffc9000125fa78 R11: ffffffff82f571e8 R12: ffff888109459cb0
[ 47.019883] R13: ffff888109459cb0 R14: ffff88810945ad00 R15: 0000000000000000
[ 47.019906] FS: 00007fba6fde1680(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[ 47.019928] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.019949] CR2: 00007fba6fc5ac80 CR3: 0000000104819000 CR4: 0000000000750ef0
[ 47.019972] PKRU: 55555554
[ 47.019993] Call Trace:
[ 47.020014] <TASK>
[ 47.020034] ? die+0x32/0x80
[ 47.020056] ? do_trap+0xd5/0x100
[ 47.020078] ? __list_add_valid_or_report+0x70/0x90
[ 47.020100] ? __list_add_valid_or_report+0x70/0x90
[ 47.020122] ? do_error_trap+0x81/0x110
[ 47.020144] ? __list_add_valid_or_report+0x70/0x90
[ 47.020166] ? exc_invalid_op+0x4c/0x60
[ 47.020189] ? __list_add_valid_or_report+0x70/0x90
[ 47.020211] ? asm_exc_invalid_op+0x16/0x20
[ 47.020237] ? __list_add_valid_or_report+0x70/0x90
[ 47.020355] ? __list_add_valid_or_report+0x70/0x90
[ 47.020424] unix_inflight+0x6a/0xf0
[ 47.020483] unix_scm_to_skb+0xe4/0x160
[ 47.020518] unix_stream_sendmsg+0x174/0x630
[ 47.020573] __sock_sendmsg+0x38/0x70
[ 47.020635] ____sys_sendmsg+0x237/0x2a0
[ 47.020694] ? import_iovec+0x16/0x20
[ 47.020754] ___sys_sendmsg+0x86/0xd0
[ 47.020791] ? find_held_lock+0x2b/0x80
[ 47.021011] ? lock_release+0x133/0x290
[ 47.021467] ? __fget_files+0xca/0x180
[ 47.021564] __sys_sendmsg+0x47/0x80
[ 47.021663] do_syscall_64+0x94/0x180
[ 47.021751] ? do_syscall_64+0xa1/0x180
[ 47.021812] ? do_syscall_64+0xa1/0x180
[ 47.021864] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 47.021890] RIP: 0033:0x7fba6fd15dbb
[ 47.021917] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 69 2c f7 ff 8b 55 ec 48 8b 75 f0 41 89 c0 8b 7d f8 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 45 f8 e8 c1 2c f7 ff 48 8b
[ 47.021948] RSP: 002b:00007ffc1aa77410 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 47.021974] RAX: ffffffffffffffda RBX: 00007ffc1aa775c8 RCX: 00007fba6fd15dbb
[ 47.022000] RDX: 0000000000000000 RSI: 00005625cb5df0e0 RDI: 0000000000000003
[ 47.022025] RBP: 00007ffc1aa77430 R08: 0000000000000000 R09: 0000000000001033
[ 47.022050] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001
[ 47.022076] R13: 0000000000000000 R14: 00007fba6fe2c000 R15: 00005625cb5dedd8
[ 47.022103] </TASK>
[ 47.022127] Modules linked in: 9p netfs kvm_intel kvm 9pnet_virtio 9pnet i2c_piix4 zram crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
[ 47.022209] ---[ end trace 0000000000000000 ]---
[ 47.022233] RIP: 0010:__list_add_valid_or_report+0x70/0x90
[ 47.022258] Code: 98 ff 0f 0b 48 89 c1 48 c7 c7 90 1a 72 82 e8 77 ed 98 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 e8 1a 72 82 e8 60 ed 98 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 40 1b 72 82 e8 49 ed 98
[ 47.022284] RSP: 0018:ffffc9000125fba8 EFLAGS: 00010286
[ 47.022308] RAX: 0000000000000075 RBX: ffff888109459680 RCX: 0000000000000000
[ 47.022332] RDX: 0000000000000002 RSI: ffffffff82667570 RDI: 00000000ffffffff
[ 47.022356] RBP: ffff8881067bc400 R08: 0000000000000000 R09: 0000000000000003
[ 47.022406] R10: ffffc9000125fa78 R11: ffffffff82f571e8 R12: ffff888109459cb0
[ 47.022451] R13: ffff888109459cb0 R14: ffff88810945ad00 R15: 0000000000000000
[ 47.022517] FS: 00007fba6fde1680(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[ 47.022606] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.022653] CR2: 00007fba6fc5ac80 CR3: 0000000104819000 CR4: 0000000000750ef0
[ 47.022698] PKRU: 55555554
[ 47.022764] note: gc_vs_connect[1057] exited with preempt_count 1
Michal Luczaj (2):
af_unix: Fix garbage collector racing against connect()
af_unix: Add GC race reproducer + slow down unix_stream_connect()
net/unix/af_unix.c | 2 +
net/unix/garbage.c | 20 ++-
tools/testing/selftests/net/af_unix/Makefile | 2 +-
.../selftests/net/af_unix/gc_vs_connect.c | 158 ++++++++++++++++++
4 files changed, 178 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/net/af_unix/gc_vs_connect.c
--
2.44.0
Powered by blists - more mailing lists