[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba0fab4b-61dc-a5de-cc0e-70a6e6f66f51@linux.intel.com>
Date: Thu, 16 Feb 2023 08:40:42 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Robin Murphy <robin.murphy@....com>,
Jason Gunthorpe <jgg@...dia.com>
Cc: baolu.lu@...ux.intel.com, iommu@...ts.linux.dev,
Joerg Roedel <joro@...tes.org>,
Christoph Hellwig <hch@...radead.org>,
Kevin Tian <kevin.tian@...el.com>,
Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem
On 2/15/23 7:24 PM, Robin Murphy wrote:
> On 2023-02-15 05:34, Baolu Lu wrote:
>> On 2/13/23 10:16 PM, Jason Gunthorpe wrote:
>>> On Mon, Feb 13, 2023 at 03:49:38PM +0800, Lu Baolu wrote:
>>>
>>>> +static int iommu_group_freeze_dev_ops(struct iommu_group *group)
>>>> +{
>>>> + struct group_device *device;
>>>> + struct device *dev;
>>>> +
>>>> + mutex_lock(&group->mutex);
>>>> + list_for_each_entry(device, &group->devices, list) {
>>>> + dev = device->dev;
>>>> + down_read(&dev->iommu->ops_rwsem);
>>>
>>> This isn't allowed, you can't obtain locks in a loop like this, it
>>> will deadlock.
>>>
>>> You don't need more locks, we already have the group mutex, the
>>> release path should be fixed to use it properly as I was trying to do
>>> here:
>>>
>>> https://lore.kernel.org/kvm/4-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com/
>>> https://lore.kernel.org/kvm/YyyTxx0HnA3maxEk@nvidia.com/
>>>
>>> Then what you'd do in a path like this is:
>>>
>>> mutex_lock(&group->mutex);
>>> dev = iommu_group_first_device(group)
>>> if (!dev)
>>> /* Racing with group cleanup */
>>> return -EINVAL;
>>> /* Now dev->ops is valid and must remain valid so long as
>>> group->mutex is held */
>>>
>>> The only reason this doesn't work already is because of the above
>>> stuff about release not holding the group mutex when manipulating the
>>> devices in the group. This is simply mis-locked.
>>>
>>> Robin explained it was done like this because
>>> arm_iommu_detach_device() re-enters the iommu core during release_dev,
>>> so I suggest fixing that by simply not doing that (see above)
>>>
>>> Below is the lastest attempt I had tried, I didn't have time to get back
>>> to it and we fixed the bug another way. It needs a bit of adjusting to
>>> also remove the device from the group after release is called within
>>> the same mutex critical region.
>>
>> Yes. If we can make remove device from list and device release in the
>> same mutex critical region, we don't need any other lock mechanism
>> anymore.
>>
>> The ipmmu driver supports default domain.
>
> It supports default domains *on arm64*. Nothing on 32-bit ARM uses
> default domains, they won't even exist since iommu-dma is not enabled,
> and either way the ARM DMA ops don't understand groups. I don't see an
> obvious satisfactory solution while the arm_iommu_* APIs still exist,
> but if we need bodges in the interim could we please try to concentrate
> them in ARM-specific code?
Yes, sure. Thanks for the guide.
Best regards,
baolu
Powered by blists - more mailing lists