[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG48ez20ot+-H1DmJ86COL4p_oif4mw9dvBRD7Ff5B-=axhtPQ@mail.gmail.com>
Date: Mon, 12 May 2025 19:08:49 +0200
From: Jann Horn <jannh@...gle.com>
To: Jens Axboe <axboe@...nel.dk>, Andrew Morton <akpm@...ux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Pedro Falcato <pfalcato@...e.de>
Cc: syzbot <syzbot+8be9bf36c3cf574426c8@...kaller.appspotmail.com>,
asml.silence@...il.com, io-uring@...r.kernel.org,
linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
Linux-MM <linux-mm@...ck.org>, dennis@...nel.org
Subject: Re: [syzbot] [io-uring] KCSAN: data-race in copy_mm / percpu_counter_destroy_many
+mmap maintainers
On Mon, May 12, 2025 at 3:51 PM Jens Axboe <axboe@...nel.dk> wrote:
> On 5/12/25 12:36 AM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 3ce9925823c7 Merge tag 'mm-hotfixes-stable-2025-05-10-14-2..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14ff74d4580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=6154604431d9aaf9
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8be9bf36c3cf574426c8
> > compiler: Debian clang version 20.1.2 (++20250402124445+58df0ef89dd6-1~exp1~20250402004600.97), Debian LLD 20.1.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/afdc6302fc05/disk-3ce99258.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/fc7f98d3c420/vmlinux-3ce99258.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/ea7ca2da2258/bzImage-3ce99258.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8be9bf36c3cf574426c8@...kaller.appspotmail.com
> >
> > ==================================================================
> > BUG: KCSAN: data-race in copy_mm / percpu_counter_destroy_many
> >
> > write to 0xffff8881045e19d8 of 8 bytes by task 2123 on cpu 0:
> > __list_del include/linux/list.h:195 [inline]
> > __list_del_entry include/linux/list.h:218 [inline]
> > list_del include/linux/list.h:229 [inline]
> > percpu_counter_destroy_many+0xc7/0x2b0 lib/percpu_counter.c:244
> > __mmdrop+0x22e/0x350 kernel/fork.c:947
> > mmdrop include/linux/sched/mm.h:55 [inline]
> > io_ring_ctx_free+0x31e/0x360 io_uring/io_uring.c:2740
> > io_ring_exit_work+0x529/0x560 io_uring/io_uring.c:2962
> > process_one_work kernel/workqueue.c:3238 [inline]
> > process_scheduled_works+0x4cb/0x9d0 kernel/workqueue.c:3319
> > worker_thread+0x582/0x770 kernel/workqueue.c:3400
> > kthread+0x486/0x510 kernel/kthread.c:464
> > ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:153
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >
> > read to 0xffff8881045e1600 of 1344 bytes by task 5051 on cpu 1:
> > dup_mm kernel/fork.c:1728 [inline]
> > copy_mm+0xfb/0x1310 kernel/fork.c:1786
> > copy_process+0xcf1/0x1f90 kernel/fork.c:2429
> > kernel_clone+0x16c/0x5b0 kernel/fork.c:2844
> > __do_sys_clone kernel/fork.c:2987 [inline]
> > __se_sys_clone kernel/fork.c:2971 [inline]
> > __x64_sys_clone+0xe6/0x120 kernel/fork.c:2971
> > x64_sys_call+0x2c59/0x2fb0 arch/x86/include/generated/asm/syscalls_64.h:57
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xd0/0x1a0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 1 UID: 0 PID: 5051 Comm: syz.1.494 Not tainted 6.15.0-rc5-syzkaller-00300-g3ce9925823c7 #0 PREEMPT(voluntary)
>
> This doesn't look like an io_uring issue, it's successive setup and teardown
> of a percpu counter. Adding some relevant folks.
This is an intentional-but-dodgy data race:
dup_mm() does memcpy() of the entire old mm_struct, while the old
mm_struct is not yet locked, in order to initialize some parts of the
new mm_struct that are expected to be stable (I guess?); and then
afterwards re-initializes all the important fields with proper
locking.
(What happens in the KCSAN report is that one mm is being forked while
an **unrelated** mm is being destroyed; and the unrelated mm's
destruction involves a linked list deletion at mm.rss_stat[].list, and
the two mm's happen to be adjacent on this linked list, so destroying
the unrelated mm updates a linked list element embedded in the first
mm.)
This design is pretty dirty, and it would be a good idea to get rid of
that memcpy() by going through every mm_struct field, checking whether
it is already initialized after the memcpy(), and figuring out proper
initialization if it isn't; another easier, less clean approach would
be to just slap a data_race() annotation around the memcpy() for now.
>From a quick look, I think these are the fields that might not be reinitialized:
Fields that are safe to copy without lock because they're immutable:
- mmap_{compat_,}{legacy_,}base
- task_size
- binfmt
Fields which could actually race but only if CRIU-specific APIs raced
with execve:
- {start,end}_{code,data}
- start_brk, start_stack
- {arg,env}_{start,end}
- saved_auxv
Fields which actually look like they might be data racing in practice
(but with most of these, probably not much bad stuff actually results
from that):
- membarrier_state
- mm_cid_next_scan
- brk
- numa_scan_offset
- tlb_flush_batched
- ksm_merging_pages, ksm_rmap_items, ksm_zero_pages
And then there's arch-specific stuff in mm_context_t, I haven't looked
at that for all architectures. But there is some pretty dodgy stuff in
there too, for example on X86 the mm->context.vdso_image pointer can
be updated through the CRIU API concurrently with the memcpy(), and it
doesn't look like that pointer is re-initialized later, so I think
that could theoretically result in a torn pointer being accessed later
on in the child. (Note that memcpy() is very much not guaranteed to
copy pointer-sized fields atomically, though it tends to often do that
in practice.)
Powered by blists - more mailing lists