lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aDlI69HGnflsD-ss@ysun46-mobl.ccr.corp.intel.com>
Date: Fri, 30 May 2025 13:58:03 +0800
From: Yi Sun <yi.sun@...el.com>
To: Shuai Xue <xueshuai@...ux.alibaba.com>
CC: Vinicius Costa Gomes <vinicius.gomes@...el.com>, <dave.jiang@...el.com>,
	<dmaengine@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<gordon.jin@...el.com>
Subject: Re: [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module
 unload

On 30.05.2025 13:39, Shuai Xue wrote:
>
>
>在 2025/5/30 11:06, Yi Sun 写道:
>>On 29.05.2025 10:04, Vinicius Costa Gomes wrote:
>>>Yi Sun <yi.sun@...el.com> writes:
>>>
>>>>A recent refactor introduced a misplaced put_device() call, resulting in
>>>>reference count underflow when the module is unloaded.
>>>>
>>>>Expand the idxd_cleanup() function to handle proper cleanup, and remove
>>>>idxd_cleanup_internals() as it was not part of the driver unload path.
>>>>
>>>
>>>'idxd_cleanup_internals()' frees a bunch of stuff. I would expect an
>>>explanation of when those things are being free'd now that removed that
>>>call.
>>>
>>
>>I believe the call to idxd_unregister_devices(), which is invoked at the
>>very beginning of idxd_remove(), already takes care of the necessary
>>put_device() through the following call path:
>>idxd_unregister_devices() -> device_unregister() -> put_device()
>>
>>Therefore, there's no need to add additional put_device() calls for idxd
>>groups, engines, or workqueues. While the commit message for a409e919ca3
>>states: "Note, this also fixes the missing put_device() for idxd groups,
>>engines, and wqs."
>>it appears that no such omission actually existed, this part of the flow
>>was already correctly handled.
>>
>>Moreover, this refcount underflow issue appears to be a a clear
>>regression. Prior to this commit, idxd_cleanup_internals() was not part of
>>the driver unload path. The commit did not provide a strong justification
>>for calling idxd_cleanup_internals() within idxd_cleanup().
>>
>>For reference, the both two related bugs produce nearly identical call
>>traces, and I think both are blocking issues.
>>
>>Thanks
>>    --Sun, Yi
>
>Hi, Sun, Yi
>
>idxd_pci_probe_alloc() {
>	rc = idxd_probe(idxd);
>	rc = idxd_register_devices(idxd);
>	if (rc) {
>		dev_err(dev, "IDXD sysfs setup failed\n");
>		goto err_dev_register;
>	}
>// ...
> err_dev_register:			<-
>	idxd_cleanup(idxd);
>}
>
>We use idxd_cleanup() when register devices failed to undo the
>idxd_probe(). idxd_probe() sets up idxd groups, engine and wqs and
>get reference counts by device_initialize().
>
>what's the difference between idxd_cleanup_internals() and
>idxd_unregister_devices()?

Hi Shuai,

Prior to the commit, I meant the function idxd_remove() did not invoke
idxd_cleanup() or idxd_cleanup_internals(), and it was able to handle
reference counts correctly via idxd_unregister_devices().

However, the code refactor introduced the use of the idxd_cleanup()
helper, which internally calls idxd_cleanup_internals(). This results
in a duplicate put_device() call, leading to a reference count underflow
issue.

Thanks
    --Sun, Yi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ