[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220531185110.GJ1343366@nvidia.com>
Date: Tue, 31 May 2022 15:51:10 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Robin Murphy <robin.murphy@....com>
Cc: Baolu Lu <baolu.lu@...ux.intel.com>,
Joerg Roedel <joro@...tes.org>,
Kevin Tian <kevin.tian@...el.com>,
Ashok Raj <ashok.raj@...el.com>,
Christoph Hellwig <hch@...radead.org>,
Will Deacon <will@...nel.org>, Liu Yi L <yi.l.liu@...el.com>,
Jacob jun Pan <jacob.jun.pan@...el.com>,
iommu@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in
debugfs
On Tue, May 31, 2022 at 07:07:32PM +0100, Robin Murphy wrote:
> > And we expect the iommu driver to be unable to free page table levels
> > that have IOVA boundaries in them?
>
> I'm not entirely sure what you mean there, but in general an unmap request
> is expected to match some previous map request
atomic cmpxchg is OK for inserting new page table levels but it can't
protect you against concurrent freeing of page table levels. So
without locks it means that page tables can't usually be freed. Which
seems to match what the Intel driver does - at least from a cursory
look.
This is one of the reasons the mm has the mmap/etc lock and spinlocks
because we do expect page table levels to get wiped out when VMA's are
zap'd - all the different locks provide the protection against page
tables disappearing under from something manipulating them.
Basically every "lockless" walk in (process) MM land is actually
protected by some kind of lock that blocks zap_page_range() from
removing the page table levels themselves.
> They might either unmap the entire region originally mapped, or just
> the requested part, or might fail entirely (IIRC there was some
> nasty code in VFIO for detecting a particular behaviour).
This is something I did differently in iommufd. It always generates
unmaps that are strict supersets of the maps it issued. So it would be
a kernel bug if the driver unmaps more or less than requested.
> Oh, I've spent the last couple of weeks hacking up horrible things
> manipulating entries in init_mm, and never realised that that was actually
> the special case. Oh well, live and learn.
The init_mm is sort of different, it doesn't have zap in quite the
same way, for example. I was talking about the typical process mm.
Anyhow, the right solution is to use RCU as I described before, Baolu
do you want to try?
Jason
Powered by blists - more mailing lists