lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <874ix5bhkz.fsf@intel.com>
Date: Tue, 27 May 2025 19:21:48 -0700
From: Vinicius Costa Gomes <vinicius.gomes@...el.com>
To: Shuai Xue <xueshuai@...ux.alibaba.com>, Dave Jiang
 <dave.jiang@...el.com>, fenghuay@...dia.com, vkoul@...nel.org
Cc: dmaengine@...r.kernel.org, colin.i.king@...il.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ
 enable and reset paths

Shuai Xue <xueshuai@...ux.alibaba.com> writes:

> 在 2025/5/23 22:54, Dave Jiang 写道:
>> 
>> 
>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>
>>>
>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>
>>>>
>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>> states.
>>>>>
>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>> ensure consistency between the software and hardware states.
>>>>>
>>>>> However, a race condition exists between the WQ enable path and the
>>>>> reset/recovery path. For example:
>>>>>
>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>> ------------------                 ------------------------
>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>                                      b1. issue IDXD_CMD_RESET_DEVICE
>>>>>                                      b2. clear wq->state
>>>>>                                      b3. check wq_enable_map bit, not set
>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>> a3. set wq_enable_map
>>>>>
>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>
>>>>
>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>
>>>> DJ
>>>
>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>
>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>> index 38633ec5b60e..c0dc904b2a94 100644
>>> --- a/drivers/dma/idxd/device.c
>>> +++ b/drivers/dma/idxd/device.c
>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>   }
>>>   EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>   
>>> +/*
>>> + * This function enables a WQ in hareware and updates the driver maintained
>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>> + * otherwise disable the WQ without the driver's state being properly
>>> + * updated.
>>> + *
>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>> + * issues arise.
>>> + */
>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>> +{
>>> +       struct idxd_device *idxd = wq->idxd;
>>> +       int ret;
>>> +
>>> +       spin_lock(&idxd->dev_lock);
>> 
>> Let's start using the new cleanup macro going forward:
>> guard(spinlock)(&idxd->dev_lock);
>> 
>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>> guard(device)(idxd_confdev(idxd));
>
> Then we need to replace the lock from spinlock to mutex lock?

We still need a (spin) lock that we could hold in interrupt contexts.

>
>> 
>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>> guard(device)(wq_confdev(wq));
>> 
>> If you are up for it that is.
>
> We creates a hierarchy: pdev -> idxd device -> wq device.
> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>
>      (wq_confdev(wq))->parent = idxd_confdev(idxd);
>
> Is it safe to grap lock of idxd_confdev(idxd) under hold
> lock of wq_confdev(wq)?
>
> We have mounts of code use spinlock of idxd->dev_lock under
> hold of wq->wq_lock.
>

I agree with Dave that the locking could be simplified, but I don't
think that we should hold this series because of that. That
simplification can be done later.

>> 
>> 
>>> +       ret = idxd_wq_enable_locked(wq);
>>> +       spin_unlock(&idxd->dev_lock);
>>> +
>>> +       return ret;
>>> +}
>>> +
>>>   int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
>>>   {
>>>          struct idxd_device *idxd = wq->idxd;
>>> @@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
>>>   
>>>          __idxd_wq_set_pasid_locked(wq, pasid);
>>>   
>>> -       rc = idxd_wq_enable(wq);
>>> +       rc = idxd_wq_enable_locked(wq);
>>>          if (rc < 0)
>>>                  return rc;
>>>   
>>> @@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
>>>          iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
>>>          spin_unlock(&idxd->dev_lock);
>>>   
>>> -       rc = idxd_wq_enable(wq);
>>> +       rc = idxd_wq_enable_locked(wq);
>>>          if (rc < 0)
>>>                  return rc;
>>>   
>>> @@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
>>>   
>>>   void idxd_device_reset(struct idxd_device *idxd)
>>>   {
>>> +
>>> +       spin_lock(&idxd->dev_lock);
>>>          idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
>>> +       spin_unlock(&idxd->dev_lock);
>>> +
>>>
>> 
>> I think you just need the wq_enable locked and also in idxd_device_clear_state(), extend the lock to the whole function. Locking the reset function around just the command execute won't protect the wq enable path against the changing of the software states on the reset side.
>
> Quite agreed.
>
>> 
>> DJ
>> 
>>> (The dev_lock should also apply to idxd_wq_enable(), I did not paste here)
>>>
>>> Also, I found a new bug that idxd_device_config() is called without
>>> hold idxd->dev_lock.
>>>> idxd_device_config() explictly asserts the hold of idxd->dev_lock.
>>>
>>> +++ b/drivers/dma/idxd/irq.c
>>> @@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
>>>   {
>>>          struct idxd_device *idxd = container_of(work, struct idxd_device, work);
>>>          struct device *dev = &idxd->pdev->dev;
>>> -       int rc, i;
>>> +       int rc = 0, i;
>>>   
>>>          idxd_device_reset(idxd);
>>> -       rc = idxd_device_config(idxd);
>>> -       if (rc < 0)
>>> +       spin_lock(&idxd->dev_lock);
>> I wonder if you should also just lock the idxd_device_reset() and the idxd_device_enable() as well in this case as you don't anything to interfere with the entire reinit path.
>
> During reset, any operation to enable wq should indeed be avoided,
> but there isn't a suitable lock currently. idxd->dev_lock is a
> lightweight lock, only used when updating the device state, and
> it's used while holding wq->wq_lock. Therefore, holding idxd->dev_lock
> currently cannot form mutual exclusion with wq->wq_lock.
>
> And the sub caller of idxd_device_reset(), e.g. idxd_device_clear_state()
> also spins to hold idxd->dev_lock.
>
> A hack way it to grab wq_lock of all wqs before before reinit, but
> this is hardly elegant (:
>
> Thanks.
> Have a nice holiday!
>
> Best regards,
> Shuai
>


Cheers,
-- 
Vinicius

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ