[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <80145652-b9ca-57b5-ad95-ca12d6a25eea@arm.com>
Date: Fri, 10 Dec 2021 18:13:09 +0000
From: Robin Murphy <robin.murphy@....com>
To: John Garry <john.garry@...wei.com>, joro@...tes.org,
will@...nel.org
Cc: linux-kernel@...r.kernel.org, willy@...radead.org,
linux-mm@...ck.org, iommu@...ts.linux-foundation.org,
Xiongfeng Wang <wangxiongfeng2@...wei.com>
Subject: Re: [PATCH v2 01/11] iommu/iova: Fix race between FQ timeout and
teardown
On 2021-12-10 18:04, John Garry via iommu wrote:
> On 10/12/2021 17:54, Robin Murphy wrote:
>> From: Xiongfeng Wang<wangxiongfeng2@...wei.com>
>>
>> It turns out to be possible for hotplugging out a device to reach the
>> stage of tearing down the device's group and default domain before the
>> domain's flush queue has drained naturally. At this point, it is then
>> possible for the timeout to expire just*before* the del_timer() call
>
> super nit: "just*before* the" - needs a whitespace before "before" :)
Weird... the original patch file here and the copy received by lore via
linux-iommu look fine, gremlins in your MUA or delivery path perhaps?
>> from free_iova_flush_queue(), such that we then proceed to free the FQ
>> resources while fq_flush_timeout() is still accessing them on another
>> CPU. Crashes due to this have been observed in the wild while removing
>> NVMe devices.
>>
>> Close the race window by using del_timer_sync() to safely wait for any
>> active timeout handler to finish before we start to free things. We
>> already avoid any locking in free_iova_flush_queue() since the FQ is
>> supposed to be inactive anyway, so the potential deadlock scenario does
>> not apply.
>>
>> Fixes: 9a005a800ae8 ("iommu/iova: Add flush timer")
>> Signed-off-by: Xiongfeng Wang<wangxiongfeng2@...wei.com>
>> [ rm: rewrite commit message ]
>> Signed-off-by: Robin Murphy<robin.murphy@....com>
>
> FWIW,
>
> Reviewed-by: John Garry <john.garry@...wei.com>
Thanks John!
Robin.
Powered by blists - more mailing lists