lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALbr=LZFZP3ioRmRx1T4Xm=LpPXRsDhkNMxM9dYrfE5nOuknNg@mail.gmail.com>
Date: Thu, 18 Sep 2025 10:21:30 +0800
From: Gui-Dong Han <hanguidong02@...il.com>
To: "yanjun.zhu" <yanjun.zhu@...ux.dev>
Cc: zyjzyj2000@...il.com, jgg@...pe.ca, leon@...nel.org, 
	linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org, 
	baijiaju1990@...il.com, stable@...r.kernel.org, rpearsonhpe@...il.com
Subject: Re: [PATCH] RDMA/rxe: Fix race in do_task() when draining

On Thu, Sep 18, 2025 at 3:31 AM yanjun.zhu <yanjun.zhu@...ux.dev> wrote:
>
> On 9/17/25 3:06 AM, Gui-Dong Han wrote:
> > When do_task() exhausts its RXE_MAX_ITERATIONS budget, it unconditionally
>
>  From the source code, it will check ret value, then set it to
> TASK_STATE_IDLE, not unconditionally.

Hi Yanjun,

Thanks for your review. Let me clarify a few points.

You are correct that the code checks the ret value. The if (!ret)
branch specifically handles the case where the RXE_MAX_ITERATIONS
limit is reached while work still remains. My use of "unconditionally"
refers to the action inside this branch, which sets the state to
TASK_STATE_IDLE without a secondary check on task->state. The original
tasklet implementation effectively checked both conditions in this
scenario.

>
> > sets the task state to TASK_STATE_IDLE to reschedule. This overwrites
> > the TASK_STATE_DRAINING state that may have been concurrently set by
> > rxe_cleanup_task() or rxe_disable_task().
>
>  From the source code, there is a spin lock to protect the state. It
> will not make race condition.

While a spinlock protects state changes, rxe_cleanup_task() and
rxe_disable_task() do not hold it for its entire duration. It acquires
the lock to set TASK_STATE_DRAINING, but then releases it to wait in
the while (!is_done(task)) loop. The race window exists when do_task()
acquires the lock during this wait period, allowing it to overwrite
the TASK_STATE_DRAINING state.

>
> >
> > This race condition breaks the cleanup and disable logic, which expects
> > the task to stop processing new work. The cleanup code may proceed while
> > do_task() reschedules itself, leading to a potential use-after-free.
> >
>
> Can you post the call trace when this problem occurred?

This issue was identified through code inspection and a static
analysis tool we are developing to detect TOCTOU bugs in the kernel,
so I do not have a runtime call trace. The bug is confirmed by
inspecting the Fixes commit (9b4b7c1f9f54), which lost the special
handling for the draining case during the migration from tasklets to
workqueues.

Regards,
Han

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ