[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zk5h3yfuZzlo2VzN@x1n>
Date: Wed, 22 May 2024 17:21:35 -0400
From: Peter Xu <peterx@...hat.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Andrew Jones <ajones@...tanamicro.com>, Yan Zhao <yan.y.zhao@...el.com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
kevin.tian@...el.com, jgg@...dia.com, yishaih@...dia.com,
shameerali.kolothum.thodi@...wei.com
Subject: Re: [PATCH] vfio/pci: take mmap write lock for io_remap_pfn_range
On Wed, May 22, 2024 at 11:50:06AM -0600, Alex Williamson wrote:
> I'm not sure if there are any outstanding blockers on Peter's side, but
> this seems like a good route from the vfio side. If we're seeing this
> now without lockdep, we might need to bite the bullet and take the hit
> with vmf_insert_pfn() while the pmd/pud path learn about pfnmaps.
No immediate blockers, it's just that there're some small details that I
may still need to look into. The current one TBD is pfn tracking
implications on PAT. Here I see at least two issues to be investigated.
Firstly, when vfio zap bars it can try to remove VM_PAT flag. To be
explicit, unmap_single_vma() has:
if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn(vma, 0, 0, mm_wr_locked);
I believe it'll also erase the entry on the memtype_rbroot.. I'm not sure
whether that's correct at all, and if that's correct how we should
re-inject that. So far I feel like we should keep that pfn tracking stuff
alone from tearing down pgtables only, but I'll need to double check.
E.g. I at least checked MADV_DONTNEED won't allow to apply on PFNMAPs, so
vfio zapping the vma should be the 1st one can do that besides munmap().
The other thing is I just noticed very recently that the PAT bit on x86_64
is not always the same one.. on 4K it's bit 7, but it's reused as PSE on
higher levels, moving PAT to bit 12:
#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */
#define _PAGE_BIT_PAT 7 /* on 4KB pages */
#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */
We may need something like protval_4k_2_large() when injecting huge
mappings.
>From the schedule POV, the plan is I'll continue work on this after I flush
the inbox for the past two weeks and when I'll get some spare time. Now
~160 emails left.. but I'm getting there. If there's comments for either
of above, please shoot.
Thanks,
--
Peter Xu
Powered by blists - more mailing lists