lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1bb984a1.4455.1995208fc7b.Coremail.duoming@zju.edu.cn>
Date: Tue, 16 Sep 2025 18:19:04 +0800 (GMT+08:00)
From: duoming@....edu.cn
To: "Jakub Kicinski" <kuba@...nel.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, pabeni@...hat.com,
	edumazet@...gle.com, davem@...emloft.net, andrew+netdev@...n.ch
Subject: Re: [PATCH net] cnic: Fix use-after-free bugs in cnic_delete_task

On Mon, 15 Sep 2025 18:22:35 -0700 Jakub Kicinski wrote:
> > The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
> > which does not guarantee that the delayed work item 'delete_task' has
> > fully completed if it was already running. Additionally, the delayed work
> > item is cyclic, flush_workqueue() in cnic_cm_stop_bnx2x_hw() could not
> > prevent the new incoming ones. This leads to use-after-free scenarios
> > where the cnic_dev is deallocated by cnic_free_dev(), while delete_task
> > remains active and attempt to dereference cnic_dev in cnic_delete_task().
> 
> [snip]
> 
> > Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
> > that the delayed work item is properly canceled and any executing delayed
> > work has finished before the cnic_dev is deallocated.
> 
> Have you tested this on real HW? Please always include information on
> how you discovered the problem and whether you managed to test the fix.

To reproduce the issue, I emulated the cnic device in QEMU and manually
triggered the problem by introducing delays, such as calls to ssleep(), 
within the cnic_delete_task() function.

While the delayed work was executing, cancel_delayed_work() failed to 
terminate it. Furthermore, since cnic_delete_task() is a recurring
delayed work item, flush_workqueue() only blocks and waits for work
items that were already queued to the workqueue prior to its invocation. 
Any work items submitted after flush_workqueue() is called are not
included in the set of tasks that the flush operation awaits. This
means that after the cyclic work items have finished executing, 
a delayed work item may still exist in the work queue.

You can see the detail in the following link:
https://elixir.bootlin.com/linux/v6.17-rc6/source/kernel/workqueue.c#L3937

Furthermore, I wrote a kernel module to test whether the combination of
cancel_delayed_work() and flush_workqueue() can safely terminate recurring
delayed work items. The result is negative, indicating that the aforementioned
combination carries potential risks.

The cancel_delayed_work_sync calls __cancel_work(work, ... | WORK_CANCEL_DISABLE)
to attempt to remove the work item from the queue and sets the WORK_CANCEL_DISABLE
flag, preventing the work item from being executed again. Meanwhile, it uses
__flush_work(work, true) to perform a synchronous operation, waiting for any
currently executing work item to finish running.

You can see the detail in the following link:
https://elixir.bootlin.com/linux/v6.17-rc6/source/kernel/workqueue.c#L4348

> > Fixes: fdf24086f475 ("cnic: Defer iscsi connection cleanup")
> > Signed-off-by: Duoming Zhou <duoming@....edu.cn>
> 
> >  	cnic_bnx2x_delete_wait(dev, 0);
> >  
> > -	cancel_delayed_work(&cp->delete_task);
> > +	cancel_delayed_work_sync(&cp->delete_task);
> >  	flush_workqueue(cnic_wq);
> 
> AFAICT your patch is a nop, doubt this if fixing anything

This patch is not a nop, although the probability of triggering
this issue is low, this patch indeed fixes the underlying
problem.

Best regards,
Duoming Zhou

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ