lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG48ez20ot+-H1DmJ86COL4p_oif4mw9dvBRD7Ff5B-=axhtPQ@mail.gmail.com>
Date: Mon, 12 May 2025 19:08:49 +0200
From: Jann Horn <jannh@...gle.com>
To: Jens Axboe <axboe@...nel.dk>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Pedro Falcato <pfalcato@...e.de>
Cc: syzbot <syzbot+8be9bf36c3cf574426c8@...kaller.appspotmail.com>, 
	asml.silence@...il.com, io-uring@...r.kernel.org, 
	linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com, 
	Linux-MM <linux-mm@...ck.org>, dennis@...nel.org
Subject: Re: [syzbot] [io-uring] KCSAN: data-race in copy_mm / percpu_counter_destroy_many

+mmap maintainers

On Mon, May 12, 2025 at 3:51 PM Jens Axboe <axboe@...nel.dk> wrote:
> On 5/12/25 12:36 AM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    3ce9925823c7 Merge tag 'mm-hotfixes-stable-2025-05-10-14-2..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14ff74d4580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=6154604431d9aaf9
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8be9bf36c3cf574426c8
> > compiler:       Debian clang version 20.1.2 (++20250402124445+58df0ef89dd6-1~exp1~20250402004600.97), Debian LLD 20.1.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/afdc6302fc05/disk-3ce99258.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/fc7f98d3c420/vmlinux-3ce99258.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/ea7ca2da2258/bzImage-3ce99258.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8be9bf36c3cf574426c8@...kaller.appspotmail.com
> >
> > ==================================================================
> > BUG: KCSAN: data-race in copy_mm / percpu_counter_destroy_many
> >
> > write to 0xffff8881045e19d8 of 8 bytes by task 2123 on cpu 0:
> >  __list_del include/linux/list.h:195 [inline]
> >  __list_del_entry include/linux/list.h:218 [inline]
> >  list_del include/linux/list.h:229 [inline]
> >  percpu_counter_destroy_many+0xc7/0x2b0 lib/percpu_counter.c:244
> >  __mmdrop+0x22e/0x350 kernel/fork.c:947
> >  mmdrop include/linux/sched/mm.h:55 [inline]
> >  io_ring_ctx_free+0x31e/0x360 io_uring/io_uring.c:2740
> >  io_ring_exit_work+0x529/0x560 io_uring/io_uring.c:2962
> >  process_one_work kernel/workqueue.c:3238 [inline]
> >  process_scheduled_works+0x4cb/0x9d0 kernel/workqueue.c:3319
> >  worker_thread+0x582/0x770 kernel/workqueue.c:3400
> >  kthread+0x486/0x510 kernel/kthread.c:464
> >  ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:153
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >
> > read to 0xffff8881045e1600 of 1344 bytes by task 5051 on cpu 1:
> >  dup_mm kernel/fork.c:1728 [inline]
> >  copy_mm+0xfb/0x1310 kernel/fork.c:1786
> >  copy_process+0xcf1/0x1f90 kernel/fork.c:2429
> >  kernel_clone+0x16c/0x5b0 kernel/fork.c:2844
> >  __do_sys_clone kernel/fork.c:2987 [inline]
> >  __se_sys_clone kernel/fork.c:2971 [inline]
> >  __x64_sys_clone+0xe6/0x120 kernel/fork.c:2971
> >  x64_sys_call+0x2c59/0x2fb0 arch/x86/include/generated/asm/syscalls_64.h:57
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xd0/0x1a0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 1 UID: 0 PID: 5051 Comm: syz.1.494 Not tainted 6.15.0-rc5-syzkaller-00300-g3ce9925823c7 #0 PREEMPT(voluntary)
>
> This doesn't look like an io_uring issue, it's successive setup and teardown
> of a percpu counter. Adding some relevant folks.

This is an intentional-but-dodgy data race:

dup_mm() does memcpy() of the entire old mm_struct, while the old
mm_struct is not yet locked, in order to initialize some parts of the
new mm_struct that are expected to be stable (I guess?); and then
afterwards re-initializes all the important fields with proper
locking.

(What happens in the KCSAN report is that one mm is being forked while
an **unrelated** mm is being destroyed; and the unrelated mm's
destruction involves a linked list deletion at mm.rss_stat[].list, and
the two mm's happen to be adjacent on this linked list, so destroying
the unrelated mm updates a linked list element embedded in the first
mm.)

This design is pretty dirty, and it would be a good idea to get rid of
that memcpy() by going through every mm_struct field, checking whether
it is already initialized after the memcpy(), and figuring out proper
initialization if it isn't; another easier, less clean approach would
be to just slap a data_race() annotation around the memcpy() for now.

>From a quick look, I think these are the fields that might not be reinitialized:

Fields that are safe to copy without lock because they're immutable:

 - mmap_{compat_,}{legacy_,}base
 - task_size
 - binfmt

Fields which could actually race but only if CRIU-specific APIs raced
with execve:

 - {start,end}_{code,data}
 - start_brk, start_stack
 - {arg,env}_{start,end}
 - saved_auxv

Fields which actually look like they might be data racing in practice
(but with most of these, probably not much bad stuff actually results
from that):

 - membarrier_state
 - mm_cid_next_scan
 - brk
 - numa_scan_offset
 - tlb_flush_batched
 - ksm_merging_pages, ksm_rmap_items, ksm_zero_pages

And then there's arch-specific stuff in mm_context_t, I haven't looked
at that for all architectures. But there is some pretty dodgy stuff in
there too, for example on X86 the mm->context.vdso_image pointer can
be updated through the CRIU API concurrently with the memcpy(), and it
doesn't look like that pointer is re-initialized later, so I think
that could theoretically result in a torn pointer being accessed later
on in the child. (Note that memcpy() is very much not guaranteed to
copy pointer-sized fields atomically, though it tends to often do that
in practice.)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ