lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220222235353.GF10061@nvidia.com>
Date:   Tue, 22 Feb 2022 19:53:53 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Robin Murphy <robin.murphy@....com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Joerg Roedel <joro@...tes.org>,
        Alex Williamson <alex.williamson@...hat.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Kevin Tian <kevin.tian@...el.com>,
        Ashok Raj <ashok.raj@...el.com>, kvm@...r.kernel.org,
        rafael@...nel.org, David Airlie <airlied@...ux.ie>,
        linux-pci@...r.kernel.org,
        Thierry Reding <thierry.reding@...il.com>,
        Diana Craciun <diana.craciun@....nxp.com>,
        Dmitry Osipenko <digetx@...il.com>,
        Will Deacon <will@...nel.org>,
        Stuart Yoder <stuyoder@...il.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Chaitanya Kulkarni <kch@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Cornelia Huck <cohuck@...hat.com>,
        linux-kernel@...r.kernel.org, Li Yang <leoyang.li@....com>,
        iommu@...ts.linux-foundation.org,
        Jacob jun Pan <jacob.jun.pan@...el.com>,
        Daniel Vetter <daniel@...ll.ch>
Subject: Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in
 bus_type

On Tue, Feb 22, 2022 at 09:18:23PM +0000, Robin Murphy wrote:

> > Still not sure I see what you are thinking though..
> 
> What part of "How hard is it to hold group->mutex when reading or writing
> group->owner?" sounded like "complex lockless algorithm", exactly?

group->owner is not the issue, this series is already using the group
lock to protect the group_cnt and owner.

It is how you inspect the struct device it iterates over to decide if
it is still using the DMA API or not that is the problem. Hint: this
is why I keep mentioning the device_lock() as it is the only locking we
for this today.

> To spell it out, the scheme I'm proposing looks like this:

Well, I already got this, it is what is in driver_or_DMA_API_token()
that matters

I think you are suggesting to do something like:

   if (!READ_ONCE(dev->driver) ||  ???)
       return NULL;
   return group;  // A DMA_API 'token'

Which is locklessly reading dev->driver, and why you are talking about
races, I guess.

> remove:
>        bool still_owned = false;
>        mutex_lock(group->mutex);
>        list_for_each_entry(tmp, &group->devices, list) {
>                void *owner = driver_or_DMA_API_token(tmp);
>                if (tmp == dev || !owner || owner != group->owner)

And here you expect this will never be called if a group is owned by
VFIO? Which bakes in that weird behavior of really_probe() that only
some errors deserve to get a notifier?

How does the next series work? The iommu_attach_device() work relies
on the owner_cnt too. I don't think this list can replace it there.

> always serialised, even if the list walk in the remove hook sees "future"
> information WRT other devices' drivers, at worst it should merely short-cut
> to a corresponding pending reclaim of ownership.

Depending on what the actual logic is I could accept this argument,
assuming it came with a WRITE_ONCE on the store side and we all
thought carefully about how all this is ordered.

> Because the current alternative to adding a few simple lines to dd.c is
> adding loads of lines all over the place to end up calling back into common
> IOMMU code, to do something I'm 99% certain the common IOMMU code
> could do

*shrug* both Christoph and I tried to convince Greg. He never really
explained why, but currently he thinks this is the right way to design
it, and so here we are.

> for itself in private. That said, having worked through the above, it does
> start looking like a bit of a big change for this series at this point, so
> I'm happy to keep it on the back burner for when I have to rip
> .dma_configure to pieces anyway.

OK, thanks.

> According to lockdep, I think I've solved the VFIO locking issue provided
> vfio_group_viable() goes away, so I'm certainly keen not to delay that for
> another cycle!

Indeed.

Keep in mind that lockdep is disabled on the device_lock()..

> paragraph quoted above again. I'm not talking about automatic DMA API
> claiming, that clearly happens per-device; I'm talking about explicit
> callers of iommu_group_claim_dma_owner(). Does VFIO call that multiple times
> for individual devices? No. Should it? No. Is it reasonable that any other
> future callers should need to? I don't think so. Would things be easier to
> reason about if we just disallowed it outright? For sure.

iommufd is device centric and the current draft does call
iommu_group_claim_dma_owner() once for each device. It doesn't have
any reason to track groups, so it has no way to know if it is
"nesting" or not.

I hope the iommu_attach_device() work will progress and iommufd can
eventualy call a cleaner device API, it is setup to work this way at
least.

So, yes, currently future calls need the owner_cnt to work right.
(and we are doing all this to allow iommufd and vfio to share the
ownership logic - adding VFIO-like group tracking complexity to
iommufd to save a few bus callbacks is not a win, IMHO)

> > It has already been 8 weeks on this point, lets just move on please.
> 
> Sure, if it was rc7 with the merge window looming I'd be saying "this is
> close enough, let's get it in now and fix the small stuff next cycle".
> However while there's still potentially time to get things right first time,
> I for one am going to continue to point them out because I'm not a fan of
> avoidable churn. I'm sorry I haven't had a chance to look properly at this
> series between v1 and v6, but that's just how things have been.

Well, it is understandable, but this was supposed to be a smallish
cleanup series. All the improvments from the discussion are welcomed
and certainly did improve it, but this started in November and is
dragging on..

Sometimes we need churn to bring everyone along the journey.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ