lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Sep 2022 21:14:54 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Alex Williamson <alex.williamson@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>
Cc:     "Tian, Kevin" <kevin.tian@...el.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "lpivarc@...hat.com" <lpivarc@...hat.com>,
        "Liu, Jingqi" <jingqi.liu@...el.com>,
        "Lu, Baolu" <baolu.lu@...el.com>
Subject: Re: [PATCH] vfio/type1: Unpin zero pages

On 07.09.22 20:56, Alex Williamson wrote:
> On Wed, 7 Sep 2022 13:40:52 -0300
> Jason Gunthorpe <jgg@...pe.ca> wrote:
> 
>> On Wed, Sep 07, 2022 at 09:55:52AM -0600, Alex Williamson wrote:
>>
>>>> So, if we go back far enough in the git history we will find a case
>>>> where PUP is returning something for the zero page, and that something
>>>> caused is_invalid_reserved_pfn() == false since VFIO did work at some
>>>> point.
>>>
>>> Can we assume that?  It takes a while for a refcount leak on the zero
>>> page to cause an overflow.  My assumption is that it's never worked, we
>>> pin zero pages, don't account them against the locked memory limits
>>> because our is_invalid_reserved_pfn() test returns true, and therefore
>>> we don't unpin them.
>>
>> Oh, you think it has been buggy forever? That is not great..
>>   
>>>> IHMO we should simply go back to the historical behavior - make
>>>> is_invalid_reserved_pfn() check for the zero_pfn and return
>>>> false. Meaning that PUP returned it.
>>>
>>> We've never explicitly tested for zero_pfn and as David notes,
>>> accounting the zero page against the user's locked memory limits has
>>> user visible consequences.  VMs that worked with a specific locked
>>> memory limit may no longer work.
>>
>> Yes, but the question is if that is a strict ABI we have to preserve,
>> because if you take that view it also means because VFIO has this
>> historical bug that David can't fix the FOLL_FORCE issue either.
>>
>> If the view holds for memlock then it should hold for cgroups
>> also. This means the kernel can never change anything about
>> GFP_KERNEL_ACCOUNT allocations because it might impact userspace
>> having set a tight limit there.
>>
>> It means we can't fix the bug that VFIO is using the wrong storage for
>> memlock.
>>
>> It means qemu can't change anything about how it sets up this memory,
>> ie Kevin's idea to change the ordering.
>>
>> On the other hand the "abi break" we are talking about is that a user
>> might have to increase a configured limit in a config file after a
>> kernel upgrade.
>>
>> IDK what consensus exists here, I've never heard of anyone saying
>> these limits are a strict ABI like this.. I think at least for cgroup
>> that would be so invasive as to immediately be off the table.
> 
> I thought we'd already agreed that we were stuck with locked_vm for
> type1 and any compatibility mode of type1 due to this.  Native iommufd
> support can do the right thing since userspace will need to account for
> various new usage models anyway.
> 
> I've raised the issue with David for the zero page accounting, but I
> don't know what the solution is.  libvirt automatically adds a 1GB
> fudge factor to the VM locked memory limits to account for things like
> ROM mappings, or at least the non-zeropage backed portion of those
> ROMs.  I think that most management tools have adopted similar, so the
> majority of users shouldn't notice.  However this won't cover all
> users, so we certainly risk breaking userspace if we introduce hard
> page accounting of zero pages.
> 
> I think David suggested possibly allowing some degree of exceeding
> locked memory limits for zero page COWs.  Potentially type1 could do
> this as well; special case handling with some heuristically determined,
> module parameter defined limit.  We might also consider whether we
> could just ignore zero page mappings, maybe with a optional "strict"
> mode module option to generate an errno on such mappings.  Thanks,

So far I played with the ideas

a) allow slightly exceeding the limit and warn

b) weird vfio kernel parameter to control the zeropage behavior (old vs.
    new)

I certainly have in mind that we might need some toggle for vfio.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ