lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2153756-a57e-4054-bde2-deb8865c9e59@linux.alibaba.com>
Date: Fri, 23 May 2025 13:20:16 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Dave Jiang <dave.jiang@...el.com>, vinicius.gomes@...el.com,
 fenghuay@...dia.com, vkoul@...nel.org
Cc: dmaengine@...r.kernel.org, colin.i.king@...il.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ
 enable and reset paths



在 2025/5/22 22:55, Dave Jiang 写道:
> 
> 
> On 5/21/25 11:33 PM, Shuai Xue wrote:
>> A device reset command disables all WQs in hardware. If issued while a WQ
>> is being enabled, it can cause a mismatch between the software and hardware
>> states.
>>
>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>> send a reset command and clear the state (wq->state) of all WQs. It then
>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>> ensure consistency between the software and hardware states.
>>
>> However, a race condition exists between the WQ enable path and the
>> reset/recovery path. For example:
>>
>> A: WQ enable path                   B: Reset and recovery path
>> ------------------                 ------------------------
>> a1. issue IDXD_CMD_ENABLE_WQ
>>                                     b1. issue IDXD_CMD_RESET_DEVICE
>>                                     b2. clear wq->state
>>                                     b3. check wq_enable_map bit, not set
>> a2. set wq->state = IDXD_WQ_ENABLED
>> a3. set wq_enable_map
>>
>> In this case, b1 issues a reset command that disables all WQs in hardware.
>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>> leading to an inconsistency between wq->state (software) and the actual
>> hardware state (IDXD_WQ_DISABLED).
> 
> 
> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
> 
> DJ

Yep, how about add a spin lock to enable wq and reset device path.

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 38633ec5b60e..c0dc904b2a94 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
  }
  EXPORT_SYMBOL_GPL(idxd_wq_enable);
  
+/*
+ * This function enables a WQ in hareware and updates the driver maintained
+ * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
+ * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
+ * otherwise disable the WQ without the driver's state being properly
+ * updated.
+ *
+ * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
+ * called after the WQ has been explicitly disabled, so no concurrency
+ * issues arise.
+ */
+int idxd_wq_enable_locked(struct idxd_wq *wq)
+{
+       struct idxd_device *idxd = wq->idxd;
+       int ret;
+
+       spin_lock(&idxd->dev_lock);
+       ret = idxd_wq_enable_locked(wq);
+       spin_unlock(&idxd->dev_lock);
+
+       return ret;
+}
+
  int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
  {
         struct idxd_device *idxd = wq->idxd;
@@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
  
         __idxd_wq_set_pasid_locked(wq, pasid);
  
-       rc = idxd_wq_enable(wq);
+       rc = idxd_wq_enable_locked(wq);
         if (rc < 0)
                 return rc;
  
@@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
         iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
         spin_unlock(&idxd->dev_lock);
  
-       rc = idxd_wq_enable(wq);
+       rc = idxd_wq_enable_locked(wq);
         if (rc < 0)
                 return rc;
  
@@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
  
  void idxd_device_reset(struct idxd_device *idxd)
  {
+
+       spin_lock(&idxd->dev_lock);
         idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
+       spin_unlock(&idxd->dev_lock);
+

(The dev_lock should also apply to idxd_wq_enable(), I did not paste here)

Also, I found a new bug that idxd_device_config() is called without
hold idxd->dev_lock.

idxd_device_config() explictly asserts the hold of idxd->dev_lock.

+++ b/drivers/dma/idxd/irq.c
@@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
  {
         struct idxd_device *idxd = container_of(work, struct idxd_device, work);
         struct device *dev = &idxd->pdev->dev;
-       int rc, i;
+       int rc = 0, i;
  
         idxd_device_reset(idxd);
-       rc = idxd_device_config(idxd);
-       if (rc < 0)
+       spin_lock(&idxd->dev_lock);
+       if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+               rc = idxd_device_config(idxd);
+       spin_unlock(&idxd->dev_lock);
+       if (rc < 0) {
+               dev_err(dev, "Reinit: %s config fails\n", dev_name(idxd_confdev(idxd)));
                 goto out;
+       }
  
         rc = idxd_device_enable(idxd);
         if (rc < 0)

Please correct me if I missed anything.

Thanks.
Shuai


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ