[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240522183046.GG20229@nvidia.com>
Date: Wed, 22 May 2024 15:30:46 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Andrew Jones <ajones@...tanamicro.com>, Yan Zhao <yan.y.zhao@...el.com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
kevin.tian@...el.com, yishaih@...dia.com,
shameerali.kolothum.thodi@...wei.com, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH] vfio/pci: take mmap write lock for io_remap_pfn_range
On Wed, May 22, 2024 at 11:50:06AM -0600, Alex Williamson wrote:
> I'm not sure if there are any outstanding blockers on Peter's side, but
> this seems like a good route from the vfio side. If we're seeing this
> now without lockdep, we might need to bite the bullet and take the hit
> with vmf_insert_pfn() while the pmd/pud path learn about pfnmaps.
There is another alternative...
Ideally we wouldn't use the fault handler.
Instead when the MMIO becomes available again we'd iterate over all
the VMA's and do remap_pfn_range(). When the MMIO becomes unavailable
we do unmap_mapping_range() and remove it. This whole thing is
synchronous and the fault handler should simply trigger SIGBUS if
userspace races things.
unmap_mapping_range() is easy, but the remap side doesn't have a
helper today..
Something grotesque like this perhaps?
while (1) {
struct mm_struct *cur_mm = NULL;
i_mmap_lock_read(mapping);
vma_interval_tree_foreach(vma, mapping->i_mmap, 0, ULONG_MAX) {
if (vma_populated(vma))
continue;
cur_mm = vm->mm_struct;
mmgrab(cur_mm);
}
i_mmap_unlock_read(mapping);
if (!cur_mm)
return;
mmap_write_lock(cur_mm);
i_mmap_lock_read(mapping);
vma_interval_tree_foreach(vma, mapping->i_mmap, 0, ULONG_MAX) {
if (vma->mm_struct == mm)
vfio_remap_vma(vma);
}
i_mmap_unlock_read(mapping);
mmap_write_unlock(cur_mm);
mmdrop(cur_mm);
}
I'm pretty sure we need to hold the mmap_write lock to do the
remap_pfn..
vma_populated() would have to do a RCU read of the page table to check
if the page 0 is present.
Also there is a race in mmap if you call remap_pfn_range() from the
mmap fops and also use unmap_mapping_range(). mmap_region() does
call_mmap() before it does vma_link_file() which gives a window where
the VMA is populated but invisible to unmap_mapping_range(). We'd need
to add another fop to call after vma_link_file() to populate the mmap
or rely exclusively on the fault handler to populate.
Jason
Powered by blists - more mailing lists