[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB527655973E2603E73F280DF48C7A9@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Fri, 2 Sep 2022 08:24:51 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"david@...hat.com" <david@...hat.com>,
"lpivarc@...hat.com" <lpivarc@...hat.com>,
"Liu, Jingqi" <jingqi.liu@...el.com>,
"Lu, Baolu" <baolu.lu@...el.com>
Subject: RE: [PATCH] vfio/type1: Unpin zero pages
Hi, Alex,
> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Tuesday, August 30, 2022 11:06 AM
>
> There's currently a reference count leak on the zero page. We increment
> the reference via pin_user_pages_remote(), but the page is later handled
> as an invalid/reserved page, therefore it's not accounted against the
> user and not unpinned by our put_pfn().
>
> Introducing special zero page handling in put_pfn() would resolve the
> leak, but without accounting of the zero page, a single user could
> still create enough mappings to generate a reference count overflow.
>
> The zero page is always resident, so for our purposes there's no reason
> to keep it pinned. Therefore, add a loop to walk pages returned from
> pin_user_pages_remote() and unpin any zero pages.
>
We found an interesting issue on zero page and wonder whether we
should instead find a way to not use zero page in vfio pinning path.
The observation - the 'pc.bios' region (0xfffc0000) is always mapped
RO to zero page in the IOMMU page table even after the mapping in
the CPU page table has been changed after Qemu loads the guest
bios image into that region (which is mmap'ed as RW).
In reality this may not cause real problem as I don't expect any sane
usage would want to DMA read from the bios region. This is probably
the reason why nobody ever notes it.
But in concept it is incorrect.
Fixing Qemu to update/setup the VFIO mapping after loading the bios
image could mitigate this problem. But we never document such ABI
restriction on RO mappings and in concept the pinning semantics
should apply to all paths (RO in DMA and RW in CPU) which the
application uses to access the pinned memory hence the sequence
shouldn't matter from user p.o.v
And old Qemu/VMM still have this issue.
Having a notifier to implicitly fix the IOMMU mapping within the
kernel violates the semantics of pinning, and makes vfio page
accounting even more tricky.
So I wonder instead of continuing to fix trickiness around the zero
page whether it is a better idea to pursue allocating a normal
page from the beginning for pinned RO mappings?
Thanks
Kevin
Powered by blists - more mailing lists