[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqzjzggmkf7.fsf@ackerleytng-ctop.c.googlers.com>
Date: Fri, 16 Aug 2024 17:45:00 +0000
From: Ackerley Tng <ackerleytng@...gle.com>
To: David Hildenbrand <david@...hat.com>, Fuad Tabba <tabba@...gle.com>
Cc: Elliot Berman <quic_eberman@...cinc.com>, Andrew Morton <akpm@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>, Sean Christopherson <seanjc@...gle.com>,
Patrick Roy <roypat@...zon.co.uk>, qperret@...gle.com, linux-coco@...ts.linux.dev,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, kvm@...r.kernel.org
Subject: Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
David Hildenbrand <david@...hat.com> writes:
> On 15.08.24 09:24, Fuad Tabba wrote:
>> Hi David,
>
> Hi!
>
>>
>> On Tue, 6 Aug 2024 at 14:51, David Hildenbrand <david@...hat.com> wrote:
>>>
>>>>
>>>> - if (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
>>>> + if (!ops->accessible && (gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)) {
>>>> r = guest_memfd_folio_private(folio);
>>>> if (r)
>>>> goto out_err;
>>>> @@ -107,6 +109,82 @@ struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags
>>>> }
>>>> EXPORT_SYMBOL_GPL(guest_memfd_grab_folio);
>>>>
>>>> +int guest_memfd_make_inaccessible(struct file *file, struct folio *folio)
>>>> +{
>>>> + unsigned long gmem_flags = (unsigned long)file->private_data;
>>>> + unsigned long i;
>>>> + int r;
>>>> +
>>>> + unmap_mapping_folio(folio);
>>>> +
>>>> + /**
>>>> + * We can't use the refcount. It might be elevated due to
>>>> + * guest/vcpu trying to access same folio as another vcpu
>>>> + * or because userspace is trying to access folio for same reason
>>>
>>> As discussed, that's insufficient. We really have to drive the refcount
>>> to 1 -- the single reference we expect.
>>>
>>> What is the exact problem you are running into here? Who can just grab a
>>> reference and maybe do nasty things with it?
>>
>> I was wondering, why do we need to check the refcount? Isn't it enough
>> to check for page_mapped() || page_maybe_dma_pinned(), while holding
>> the folio lock?
Thank you Fuad for asking!
>
> (folio_mapped() + folio_maybe_dma_pinned())
>
> Not everything goes trough FOLL_PIN. vmsplice() is an example, or just
> some very simple read/write through /proc/pid/mem. Further, some
> O_DIRECT implementations still don't use FOLL_PIN.
>
> So if you see an additional folio reference, as soon as you mapped that
> thing to user space, you have to assume that it could be someone
> reading/writing that memory in possibly sane context. (vmsplice() should
> be using FOLL_PIN|FOLL_LONGTERM, but that's a longer discussion)
>
Thanks David for the clarification, this example is very helpful!
IIUC folio_lock() isn't a prerequisite for taking a refcount on the
folio.
Even if we are able to figure out a "safe" refcount, and check that the
current refcount == "safe" refcount before removing from direct map,
what's stopping some other part of the kernel from taking a refcount
just after the check happens and causing trouble with the folio's
removal from direct map?
> (noting that also folio_maybe_dma_pinned() can have false positives in
> some cases due to speculative references or *many* references).
Are false positives (speculative references) okay since it's better to
be safe than remove from direct map prematurely?
>
> --
> Cheers,
>
> David / dhildenb
Powered by blists - more mailing lists