lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrEgAOL9XrhlSPwr@slm.duckdns.org>
Date: Mon, 5 Aug 2024 08:54:56 -1000
From: Tejun Heo <tj@...nel.org>
To: Jens Axboe <axboe@...nel.dk>
Cc: syzbot <syzbot+b3e4f2f51ed645fd5df2@...kaller.appspotmail.com>,
	asml.silence@...il.com, io-uring@...r.kernel.org,
	linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
	Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [syzbot] [io-uring?] KCSAN: data-race in __flush_work /
 __flush_work (2)

Hello,

On Mon, Aug 05, 2024 at 08:23:28AM -0600, Jens Axboe wrote:
> > read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
> >  __flush_work+0x42a/0x570 kernel/workqueue.c:4188
> >  flush_work kernel/workqueue.c:4229 [inline]
> >  flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
> >  io_uring_try_cancel_requests+0x35b/0x370 io_uring/io_uring.c:3000
> >  io_ring_exit_work+0x148/0x500 io_uring/io_uring.c:2779
> >  process_one_work kernel/workqueue.c:3231 [inline]
> >  process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3312
> >  worker_thread+0x526/0x700 kernel/workqueue.c:3390
> >  kthread+0x1d1/0x210 kernel/kthread.c:389
> >  ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

The offending line is:

	/*
	 * start_flush_work() returned %true. If @from_cancel is set, we know
	 * that @work must have been executing during start_flush_work() and
	 * can't currently be queued. Its data must contain OFFQ bits. If @work
	 * was queued on a BH workqueue, we also know that it was running in the
	 * BH context and thus can be busy-waited.
	 */
->	data = *work_data_bits(work);
	if (from_cancel &&
	    !WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {

It is benign but the code is also wrong. When @from_cancel, we know that we
own the work item through its pending bit and thus its data bits cannot
change. Also, the read data value is only used when @from_cancel. So, the
code is not necessarily broken but the compiler can easily generate the read
before @from_cancel test, which is what the code is saying anyway and looks
like how the compiler generated the code according to the disassembly of the
vmlinux in the report.

So, it's benign in that the read value won't be used if !@...m_cancel and
data race only exists when !@...m_cancel. The code is wrong in that it can
easily generate the spurious data race read. I'll fix it.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ