lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d311645d-9677-44ca-9d86-6d37f971082c@redhat.com>
Date: Fri, 16 Aug 2024 19:56:30 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Sean Christopherson <seanjc@...gle.com>,
 Oscar Salvador <osalvador@...e.de>, Axel Rasmussen
 <axelrasmussen@...gle.com>, linux-arm-kernel@...ts.infradead.org,
 x86@...nel.org, Will Deacon <will@...nel.org>, Gavin Shan
 <gshan@...hat.com>, Paolo Bonzini <pbonzini@...hat.com>,
 Zi Yan <ziy@...dia.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Catalin Marinas <catalin.marinas@....com>, Ingo Molnar <mingo@...hat.com>,
 Alistair Popple <apopple@...dia.com>, Borislav Petkov <bp@...en8.de>,
 Thomas Gleixner <tglx@...utronix.de>, kvm@...r.kernel.org,
 Dave Hansen <dave.hansen@...ux.intel.com>,
 Alex Williamson <alex.williamson@...hat.com>, Yan Zhao <yan.y.zhao@...el.com>
Subject: Re: [PATCH 06/19] mm/pagewalk: Check pfnmap early for
 folio_walk_start()

On 16.08.24 16:21, Peter Xu wrote:
> On Fri, Aug 16, 2024 at 11:30:31AM +0200, David Hildenbrand wrote:
>> On 14.08.24 15:05, Jason Gunthorpe wrote:
>>> On Fri, Aug 09, 2024 at 07:25:36PM +0200, David Hildenbrand wrote:
>>>
>>>>>> That is in general not what we want, and we still have some places that
>>>>>> wrongly hard-code that behavior.
>>>>>>
>>>>>> In a MAP_PRIVATE mapping you might have anon pages that we can happily walk.
>>>>>>
>>>>>> vm_normal_page() / vm_normal_page_pmd() [and as commented as a TODO,
>>>>>> vm_normal_page_pud()] should be able to identify PFN maps and reject them,
>>>>>> no?
>>>>>
>>>>> Yep, I think we can also rely on special bit.
>>>
>>> It is more than just relying on the special bit..
>>>
>>> VM_PFNMAP/VM_MIXEDMAP should really only be used inside
>>> vm_normal_page() because thay are, effectively, support for a limited
>>> emulation of the special bit on arches that don't have them. There are
>>> a bunch of weird rules that are used to try and make that work
>>> properly that have to be followed.
>>>
>>> On arches with the sepcial bit they should possibly never be checked
>>> since the special bit does everything you need.
>>>
>>> Arguably any place reading those flags out side of vm_normal_page/etc
>>> is suspect.
>>
>> IIUC, your opinion matches mine: VM_PFNMAP/VM_MIXEDMAP and pte_special()/...
>> usage should be limited to vm_normal_page/vm_normal_page_pmd/ ... of course,
>> GUP-fast is special (one of the reason for "pte_special()" and friends after
>> all).
> 
> The issue is at least GUP currently doesn't work with pfnmaps, while
> there're potentially users who wants to be able to work on both page +
> !page use cases.  Besides access_process_vm(), KVM also uses similar thing,
> and maybe more; these all seem to be valid use case of reference the vma
> flags for PFNMAP and such, so they can identify "it's pfnmap" or more
> generic issues like "permission check error on pgtable".

What at least VFIO does is first try GUP, and if that fails, try 
follow_fault_pfn()->follow_pte(). There is a VM_PFNMAP check in there, yes.

Ideally, follow_pte() would never return refcounted/normal pages, then 
the PFNMAP check might only be a performance improvement (maybe).

> 
> The whole private mapping thing definitely made it complicated.

Yes, and follow_pte() for now could even return ordinary anon pages. I 
spotted that when I was working on that VM_PAT stuff, but I was to 
unsure what to do (see below that KVM with MAP_PRIVATE /dev/mem might 
just work, no idea if there are use cases?).

Fortunately, vfio calls is_invalid_reserved_pfn() and refuses anything 
that has a struct page.

I think KVM does something nasty: if it something with a "struct page", 
and it's not PageReserved, it would take a reference (if I get 
kvm_pfn_to_refcounted_page()) independent if it's a "normal" or "not 
normal" page -- it essentially ignores the vm_normal_page() information 
in the page tables ...

So anon pages in pivate mappings from follow_pte() might currently work 
with KVM ... because of the way KVM uses follow_pte().

I did not play with it, so I'm not sure if I am missing some detail.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ